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FOREWORD 


I have great pleasure in announcing that in future Sankhya: The Indian 
Journal of Statisties would be published in two separate series. Series A will contain 
articles with emphasis on methods and techniques, and Series B on applications, data, 
and records. : 


I may recall briefly how the first number of Sankhya was published. Since 
1932, regular scientific meetings were being organised by Indian Statistical Institute 
at which statistical papers were presented. There was a pressing need for a journal 
in which such papers could be published; and the Council of the Indian Statistical 
Institute decided to start Sankhya as the official organ of the Institute. The first 
number of Sankhyā was issued in June 1933; and since then the Journal is being 
published, by the Statistical Publishing Society, as the official organ of the Indian 
Statistical Institute. 


The meaning of the word Sankhyà and the scope of the Journal, was explained 
as follows in the Editorial published in Volume 1 of Sankhyd. 

“We believe that the idea underlying the integral concepts of statistics finds 
adequate expression in the ancient Indian word sankhyd. In Sanskrit the 
usual meaning is ‘number’, but the original root meaning was ‘determinate 
knowledge’.! In the Atharva-Veda, a derivative form sankhyata occurs both 
in the sense of ‘well-known’ as well as ‘numbered’. 

The history of the word Sankhya shows the intimate connexion which has 
existed for more than 3000 years in the Indian mind between ‘adequate 
knowledge’ and ‘number’.? 


21 Тһө word is derived from khya (‘to perceive, view’; ‘to be known’, ‘to make well-known’ in 
Monier Williams's Dictionary). The root meaning is ‘determinate knowledge’, *deliberatien' or ‘whatever 
helps us in obtaining determinate knowledge’ according as the Аг? suffix is taken in the active or 
instrumental form. From the latter phase is derived the technical meaning of ‘number’. 

2 Atharva- Veda, 4.25.2. It also occursin 4.16.5 and 12.3.28. Winternitz after a full discussion 
of the date of the Vedic age says “we shall probably have to date the beginning of this development about 
2000 or 2500 B.C., and the end of it between 750 and 500 B.O.": A History of Indian Literature (English 
Translation, Calcutta University, 1927), Vol. I, p. 310. While the present form of the Atharva- Veda is 
belived to be later than that of the Rg- Veda, much of its material is considered to be as old as, if not 
older than, many portions of the Rg- Veda. (Winternitz, p. 127). The word sankhya was in common 


use in tho sense of number in the time of Pānini. 
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“Ая we interpret it, the fundamental aim of statistics is to give determinate 
and adequate knowledge of reality with the help of numbers and numerical 
analysis. The word Sankhya embodies the same idea, and this is why we 
have chosen this name for the Indian Journal of Statistics. Тһе spirit 
and outlook of Sankhya will be universal, but its form and content must 
necessarily be, to some extent, regional. 


We shall keep the special needs of India in view without, however, restricting 
the scope of the Journal in any way. We shall naturally devote closer 
attention to the collection and analysis of data relating to India, but we shall 
try to study all Indian questions in relation to world problems. 


А research journal serves that narrow border land which separates the known 
from the unknown, and it is not always possible to see clearly the lines of future 
developments. We shall, therefore, invite papers of all kinds appraising them 
only on the basis of observational accuracy and logical reasoning. We shall 
publish carefully collected statistical materials irrespective of the subject even 
if they have not received any analytie treatment. We shall pay special 
attention to developments of the mathematical theory of statisties, and include 
abstracts and expositions of important papers published elsewhere. We 
shall try to help statistical researches on 'eo-operative lines by bringing 
workers in different parts of India in contact, and by providing a medium 
for exchange of ideas. Bibliographies of Indian Statistical publications, 
numerical tables tending to reduce the labour of computation, book reviews, 
and notes and'comments on current topies are some of the ways in which we 
shall try to make Sankhya useful to statistical workers in India, Knowing 
that our resources are small we shall seek guidance and help from other 
countries, and we shall welcome and thankfully receive papers from abroad." 


- We have not been able to do all that we had in view. We are glad, however, 
that an increasing volume of matter is being received in recent years. "The Council 
of the Indian Statistical Institute therefore decided that Sankhya should be published 
from 1960 in two separate series. We take this opportunity of reminding ourselves 
of our original programme. 


‘We thank the many readers and the subscribers to Sankhya for their kind 
support and encouragement during the last twenty-seven years without which Sankhya 
would not have occupied the position it holds today. 


Р. 6. Mahalanobis 
Editor 


SAMPLING THE REFERENCE SET 


By бів RONALD A. FISHER 
University of Adelaide 


1. THE REFERENCE SET OF A PROBABILITY STATEMENT 


Every test of significance involves probability statements, conditioned on 
the truth of the hypothesis. Conceptually, these can be verified by sampling the ` 
reference set which is their mathematical basis. The misapprehension has, however, 
been widely promulgated that such sampling involves no more than a repetition of 
the process by which the data to be tested came into existence. Easy examples, 
(Fisher, 1956-59) such as the test of significance of a linear regression, or the test of 
proportionality in a two by two table, have frequently been cited to show that such’ 
a method of “repeated sampling from the same population” is erroneous, and 
irrelevant to the test of significance which it is proposed to verify. The recognition 
of the appropriate reference set is an essential first step to understanding a test of ` 
significance, and, therefore, to setting up ап appropriate process of possible 
verification, We may exemplify such a process using the long-disputed test of 
significance for the difference between the means of two hypothetical populations, 
the variances of which are not in any known ratio, when a random sample of each 
is available. К 


2. INFERENCES FROM TWO NORMAL SAMPLES 


_ The means of the two samples will be represented by z, and 2, where 


(n34- 1), = S(t) 
(па--1)% = S(25) 


the estimated variances of the means by 82 and %, where 


n(n 4-1) = S(m—5) = 8; 


i ma(n34-1)s$ = S(vo—7,)* = Sz, 
these estimates being based on т; and ng degrees of freedom respectively. 


The test of significance must involve as parameters the known values ny, no 
and 51/3»; these are, therefore, known characteristics of the reference set. If a more 
comprehensive set were first considered, the facts that 8/8 is known to the observer, 
and is not irrelevant to the probability statements to be made, and that the set in 
which s,/s, is constant contains no further relevant sub-set, may be taken as defining 
the set which is to be sampled. If we are to erect a sampling process capable of 
testing any one of the values tabulated by Behrens or Sukhatme, or others, for 
Behrens’ test, the first step is to obtain random samples having the correct values for 
this ratio. : 
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8. VERIFICATION OF BEHRENS’ TEST 


Now if ci and o$ are the true variances of the populations sampled, the distri- 
bution of 


22 = log (n,-- )sto$/(ny--1) $01 
is known in terms of n, and т”. Let zp stand for that value for which 
Pr (z < zp) = P, 
and let us take for P a representative series of fractions, such as 
Р, = (2i—1)/20,000 
where 2 takes integral values from 1 to 10,000, 
For each of these 10,000 values 
оо = (тім em + Lot, 
and is, therefore, known. 


We may now take two Normal populations having equal means, and variances 
“іп the ratio required. 


Successive samples of the same sizes as those constituting the data may 
now be taken, but all will be rejected in which the experimental ratio of s, to sẹ does 
not agree, within a specified tolerance, with that originally observed. For each value 
of i the first which satisfies this condition may be taken as a random representative 
of the appropriate reference set. 


Tf d, is the tabular value to be verified, such random pairs of samples may be 
classified as satisfying one of the three inequalities, 


(2-2; < —d, М8, 
—d, VsiH-s$ ex (®,—%) s d, №888, 
dy VSS < (2—2), 
in which the values inserted for Tı, Ta, 81 and 88 are derived from the experimental 


sampling. For tabular values of the 5% point, the expected numbers in these three 
classes are 


250, 9500, 250, 
while, for values tabulated at the 1% point, the expectations are 


50, 9900, 50 
respectively, 


b 


SAMPLING THE REFERENCE SET 


As in other cases the verification need not be carried out; it is sufficient that 
it is a precisely defined procedure which боша be carried out with any degree of 
precision, and with calculable consequences. 


4, THE WEIGHTED MEAN 


The method set out above of generating a sample of the only possible reference 
set appropriate to the data may be used to illustrate the solution of some other related 
problems. For these we should suppose that а sample of a million pairs of samples 
have been obtained having the correct ra io 8/8. These pairs will differ from each 


other in the relative difference 4 ( = (%— Js 88 ) observed, but, as in the case 


of the ratio s,/s,, a selection can be made agreeing sufficiently with the original obser- 
vations in this particular. This doubly screened sample will still show variation in 
the position of the true common mean relative to the two means observed, and will 
supply the appropriate verification of the distribution of the true mean, which as 
L. С. Payne has observed, depends on the four parameters ny, na, в/в. and d and 
would, therefore, be exceedingly troublesome to tabulate, though capable of an effective 
asymptotic representation. 


For this, from the primary equations 


И = 8s 
Ш05 = Soto, 
we may derive и—= = Su, 


82-610 рз. 


= 
81-88 Mis 


а 


where 


1 Қа) 
so that 75% == ate 3 


and wp is defined by the relation 


Pr(u < up) = Р. 
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Then if 


p ue | еа, 
т 


up may be evaluated by an expansion 


Up X У ty, ni’ Ng’, 
re 
where tp is a polynomial in т, 4 and cos 0, sin 0, such as 


= à f(a9-I-2)e*-1- 4d(*-1- 1)c8s-+(6d2—2)arc?s?-+-4(d3— d)cs*} 
1 1 


Lo l (ann dyes 4 (6d2—2)z0%*—4d(22-1 Jes?--(c3-I-a)a) 
т, Any 


in which с and s stand for the cosine and sine of 0, and и„ may be obtained from its 
conjugate %,, by interchanging с and s and reversing the sign of d. 


The expansion is similar to, though owing to the additional parameter, 
somewhat more complex than the expansion for d (Table 2 of my 1941 paper). Unlike 
the distribution of d, that of the weighted mean is not symmetrical. 


5. THE UNKNOWN VARIANCES 


A third problem has been regarded as quite insoluble, namely to determine the 
simultaneous distribution of с; and ту, when, with such data, it is also given that the 
true means are equal. It may be shown without difficulty that, while the three 
statistics, ву, 5», d are jointly exhaustive, yet no two functions of these three provide 
Simultaneous exhaustive estimates of оу and б). Nor has any function of these 
quantities a sampling distribution independent simultaneously of с; and To, аз would 
be the сазе of an Ancillary Statistic. 


Now if c, and cj stand for the standard deviations chosen for the two 
populations to be sampled experimentally, and s; and 35 for the estimates of the 
standard deviations obtained for the means, because these have been selected so that, 


8p 8 
where s, and s, are the estimates from the original data, we may obtain c, and с, so 
that 
бі and 72 
4 тү 0% 
are also in the same ratio. 


SAMPLING THE REFERENCE SET 


Within the selection that has been made to give the correct values of the 
statistics 5, s, and d, we now have a distribution of pairs of values of c, and о’, so that 
a statement of the form р 


Pr (с < 2,0; < В) = F(@, P, Si 8; 4) 
‚сап be established. ; 


Тһе function on $һе right can be expressed as the ratio 


a B со оо 
f ал | do,.W + | do, ЕСІ 


0 0 9 0 


where W takes the form 


1 
о 
opt оз +0303 


e 81,8, ,. (2—23)? 
= ат 


The three statistics in the index of the exponential term are jointly exhaustive. 
Hence there is no other observable which could be used to define a recognizable and 
relevant sub-set. Without the third statistic the observations would merely justify 
the distribution appropriate to the equivalence, А 


for two independent random variables д2. 


The additional observation A 
21-2) 


introduces the factor 


CQ bs “ (2-2)! 
жазы аа) 
For variation of 02--02 this is stationary when 


1 ex (2-2)! I 


1+0 (01-08) 
The factor increases with 01--о3 up to a maximum when 
03408 = (®—®)*%, 


thereafter decreasing. 
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The example is interesting as showing that the possibility of making simul- 
taneous probability statements about parameters is not limited to cases of 
simultaneous exhaustive estimation, in the strict sense, though a jointly exhaustive 
set of statistics is still utilized. » 
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SIMPLE APPROXIMATIONS TO THE PROBABILITY INTEGRAL 
AND P(x, 1) WHEN BOTH ARE SMALL 


Ву J. В. S. HALDANE 
E Indian Statistical Institute 


The most frequently used test of significance involves either the “Чап” of the 
probability integral, or what is equivalent, the probability of y? with one degree of 
freedom. I believe, moreover, that it is more efficient, when using tests of significance, 
to think in terms of —log Q (in the Pearson-Hartley 1954) notation, rather than Q. 
и —log Q < 1.3 we do not usually regard a deviation as significant. If —log Q > 3 we 
regard it as highly significant. Good (1950). has given an approximation to —log Q, 
for a reduced normal variate, namely 


— 10105 Q = 25 224-4410 logy. 55 йй ЛД) 


whichis adequate for most purposes. My own is somewhat more accurate.’ It is well 
known that for a reduced normal distribution 


p Je (27e!*) dt 
= (gr)-he-atg- A1 —a72--1.3.274— 1.3.927-9- E 1.3.5. 72-8—); 
the series being of course an asymptotie expansion which is of little value for com- 
putation till => 4. Hence 
Q = (2n)-Ye he (3-12 — За 16271— 1242-9-- 122427 ---.)7! 
110 = —34:2—4In(z?4-2)— Hn 27— Hn(1— 3271-4: 220-8 — 168x-?--15602-10—) 


—8 _ Ж 
= раа gin(eit-2) — Hn 27} Bart 222-6 4-277 — А — 16200-04) 


logue = e 4 log 4(224-2)--0.399 nearly. AEPA] 


I use the approximation“ = .434286 for 100,6 = 494294. Good’s appro- 


ximation is equivalent to 43. The following table shows the accuracy of (2). 


© 2 2.5 3 3.5 
ор Q from (2) 1.657 2.214 2.874 3.636 
log Q (correct) 1.643, . 2.207 2.870 3.633 


It is clear that for values of a exceeding 2 the approximation is quite satis-. 
factory, and there is no objection to using .40 instead of .399. Curiously enough. the 


expression : 
: Е а LOPE" | logla?-+-2—32-*)+.309 Ў 

does not give any onic aceuracy in the neighbourhood. of 2. When — 2, it D 
—log Q = 1.628, whose error happens to be the same as that of (2). 
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For 3? we have 
Фё) = LOA Sg togu(x?+2)-+.098 “nearly. ES (3) 


We may use 0.1 for .098 with little disadvantage. Clearly the error is the same 
as that of (2), 


A more interesting, but probably less useful, approximation to the asymptotic 
expansion is obtained as follows. 


loa? + 3e4— 152-64 1052-8 9452-19.1103952-12 — 135135214 
= 1—(2-+3)1— 62-1 — 13-24-1444 109229 12249124 ) 
= 1—(9--3)71—6(29-- 1324-4 2522+ 145)-1—127302-4.L..... 


Unfortunately the next polynomial, if the process is repeated, does not have integral 
coefficients. However the expression 


1 (пей) = (m) letra - (22-1-3)-1— 6(29-1 132-1 2522+ 145)-1--0(2-4)] 

2 ex (4 
is more accurate than (2) for х > 2. When z— 2 it gives—log Q = 1.6411, or 
Q = .022826, the correct value being .022750. For values of x exceeding 3 it is very 

` accurate, and might perhaps have been used instead of the continued fraction which 
Sheppard (1939) actually employed. Its greatest interest is perhaps that it can be 
employed for the approximate summation of a number of asymptotic expansions, 
such as that for the “tail” of the exponential integral. The general expression is 


L=M Eh HLA hh)... 
= L—MtC-Eh-E1)3—R0-- --5)-- (h4-2)(3h 4- 1 -(5-Е2)(5#-- 7)--0(7).- 
On putting h = 4, t = 40%, we readily find (4). бу 
^. "here are sound theoretical grounds for using log ( 
credibility of a hypothesis, It is however hard to find a simple approximation to this 


quantity, I suggest that Q might be expressed іп C.W. Allen’s (1955) expression dex, 
(п dew = 10"), QA mi i 


Q 
Sy as а measure of the 


P(x?) were .02 for a particular sample 
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MECHANISTIC MODEL ОҒ A RANDOM. PHENOMENON* 


By М. КАТЕСКІ 
Warsaw 


SUMMARY. Let m denote the number of watches whose faces have a circumference equal to 
unity and are divided into two segments, a black and a white one. Hach of the watches has only one 
hand, whose point moves with the velocity 61, 62, ..., би. We check at intervals of a unit of time all 
the m watches to see whether their hands are in the black or in the white fola: If the number of black 
fields is even, we record А; if it is odd, we record B. 

Let (A)ym denote the frequency of the occurrence of A in N deua. and (AjAj+a)ym the 
frequency of the occurrence of the combination AA in the j-th and the (j--a)-th observations. Ina similar 
way we define the symbols (В) ут and (Bj Вуна) ут» (Aj Bj+a)ym and (Bj Ајна) ут» 

Under certain assumptions regarding the velocities 1, 02, ---,@m and under the assumption that 
the length p, of the are of the black segment on the k-th watch satisfies the condition |1-2р,| < а <1 


where а is constant, the author obtains the following theorems : 


(1) lim lim (A)Nm = lim lim (В) ут = 35 
т >00 N> т 00 N> 
(2) lim. Jim AjA. — lim lim (BjB, 
Rae (AjAj«a)Nm = TH Ud уа) т 
= lim lim (Aj Bjga)Nm = lim lim. (BjAjso)Nm = X 
m= №0 m—o0 N—»00 


which means that the j-th and the (j-+-a)-th observations are asymptotically independent with any degree of 
approximation of m is sufficiently large. 

(3) Thej-th observation is, in the same meanner, asymptotically independent of all the a observations 
preceding it. 


1. THE PRINCIPLES OF THE “WATCH GAME" 


Let us imagine m watches whose faces have a circumference equal to unity 
and are divided into two sectors—a black and a white. Each watch has only 
one hand, whose point moves with the constant velocity б, 0;...,0,, We check at 
intervals of a time unit all the m watches in order to find whether the hand is in the 
black or the white sector. In case the number of “black sectors” is even we record 
А; in case it is odd we record В. This is one lot. The game consists of guessing 
whether A or В will occur. We assume, that the velocities 0,, 0; ..., On, are irrational 
and arithmetically independent (і.е. по 0 is a linear function with rational coefficients 
of other 0%). 


If the initial positions of the points of the hands are recorded as zero, then the 
positions in subsequent N lots will be: 
0,—[0,]; 20,—[20,], AS] N0,—|[N0,], 
8,—[0,], 202—[205], ..‚ 00—110), 
02—102), Onm] -:: 20—120], 
where [0] denotes the largest integer lesser than 0. ў 
Tt should be noticed that simultaneously with the game based on the ‘sot of 
m watches analogous games based on the subset of the first Ё watches can be played. 


* A Polish version of this paper appeared. in Zastosowania Matematyki, IV, 2 (1958). 
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2. FREQUENCY OF OCCURRENCE OF А AND В 


Notations. Let us denote the frequeney of occurrence of the white sector 
оп the k-th watch by wy, and of the occurrence of the black sector by byg; moreover, 
let us denote the length of the arc of the white sector on the k-th watch by p}; thus 
the length of the are of the black sector will be 1—p,; finally, let us denote by (4) у; 
and (В) у the frequency of the occurrence of A and В respectively in the game based 
оп the subset of the first Æ watches. Thus the frequency of the occurrence of A 
and B in the game based on all m watches will be denoted Бу (A)ym and (В) 
respectively. 


Nm 


Theorem 1: Jf |1—2р„|,<-ж <1, where a is a constant number and 


k= 1,2, ...,m then lim lim (A) ym = lim Іт (В) = $. 
m> N—> co m> N—00 


Proof: From the theorems in the Appendix the following characteristics of 
the positions of hands may be derived. If, as assumed, the velocities 0,,0., y бт 


are irrational and none of them is a linear function with rational coefficients of other 
Ов we have: 


(1) According to Theorem I of the Appendix 
lim wy; = py, lim by; = 1—p,; 
N> хо 


(2) According to Theorem II of the Appendix the frequency of the occurrence 
in one lot of the white sector on watch Ё and of the white sector on watch / approaches 
Pip, when № — оо. The limits of frequency for the white-black, black-white and 
black-black combinations will be p,(1—p,), (1—p;)p, and (1—p,)( 1—p,), respectively. 
An analogous theorem will hold good for the combination including any subset of 
watches or all the m watches. In the terminology of the probability calculus we may 
Say that the position of the hand of the k-th watch is asymptotically independent of 
the positions of hands of other watches. 


Let us take into consideration the game based on the subset of the first Ё 
watches.. It follows from the characteristics (1) and (2), that (А),, approaches а 
definite limit (А), while (В), approaches (B). when N— oo, 

- Let us add to the subset the (k+-1)-th watch. In the game based on this new 
subset in a given lot A will occur either when in the game based on the subset of the 
first Ё watches A occurred in that lot and on the (k+1)-th watch the hand stopped in 
the white sector; or when in the game based on the subset of the first Æ watches 


` B occurred in that lot and on the (k--1)-th watch the hand Stopped in the black 
sector. 3 Е 


Taking into consideration the “independence” 


of the positions of the hands 
we obtain for the limit of frequency (4);;, t 


= P. (ya = (А), 1—ppss)+(B), Pia 
Analogously а = OD pii), Pen. 
Ars em ое 
12 . 
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By means of full induction we obtain : 
(4), (В), = (2—21) (1—20%).::(1— р»). 
From the assumption |1—2p,| < а < 1 it follows, that 
lim ((A)n—(B)m) = 0. 
m>% 


Thus we have lim (4), = lim (B), = 4 
m—»oo т— oo 
or lim lim (A)ym = lim lim (B)ym = 2, 
m= N—»00 т->90 1—9 00 


and this at any values of ру, Do, -> Pm if only the condition |1—2p,| < « < 1, is 
fulfilled. 


3. “INDEPENDENCE” OF TWO SUBSEQUENT LOTS 


Let us denote the frequency of the change from black to white, or vice versa, 
in. two subsequent lots (i.e. the occurrence of a white-black or black-white com- 
bination) on the k-th watch by ду». The frequency of “no change” in two subsequent 
lots (i.e. the occurrence of a white-white or black-black combination) will thus be 
equal to 1—dy,. Moreover, let us denote the frequencies of the occurrence of the 
combinations AA, BB, АВ and BA in two subsequent lots of the game based on the 
subset of the first b watches by (AA) yz, (ВВ), (АВ), and (BA) x, respectively. 
Accordingly the frequencies of the occurrence of АА, ВВ, AB and ВА in two sub- 
sequent lots of the game based on all m watches will be denoted by (AA)ym, (BB) xn; 
(AB)ym and (ВА). Finally, let us denote by Zy, the frequency of change from 
A to B or vice versa in two subsequent lots of the game based on the first k watches; 
thus we have : 

Zy; = (AB)yg-(BA)ge 1—2 = (АА) sg GB) i 
For the game based on all m watches we have obviously 

Zym = (AB) ym+(BA) xm; 1—Zym = (AA) yo - (BB) xm: 
Finally, let us denote 0,—[0;] by Ө. 

Theorem 2: If |1—2р„|<ж<1 and 0,2829, 1-б>)>0, 
|4—0;,| 2 f > 0 where a and В are constant numbers, then à 

lim lim (АА) „= lim lim (ВВ)у, = lim lim (AB), = lim lim (BA) ym. 
т-»00 N—00 т со N—»00 т-> N—»00 m-—»o 7—9 00 

Proof: From the theorems in the Appendix the following characteristics of 
the positions of the hands may be derived : : 

(3) According to Theorem III in the Appendix бу, approaches certain limit 
бк, when N— 00, or 

lim ду = 6, and lim (1--бу,) = 1—0 
—00 N>% 


At the same time according to Theorem V in the Appendix, if |1—2p;| < « < 1 and 
o > p> 0, 1—0, > B> 0, |1—6; > В > 0, where о: and / are constant numbers 
then |1—28,| <A < 1, where A is also a constant number. 
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(4) According to Theorem IV of the Appendix the frequency of the occurrence 
of “change” (or “по change") in two subsequent positions of the hand of the k-th 
watch is “independent” of the frequency of the “change” (or “no change”) on other 
watches. For instance the limit of the frequency combination of “change” on the 
k-th watch and of “no change" on the /-th watch amounts to 91-6) 


Let us now take into consideration the game based on the subset of the first 
k watches. It follows from the characteristics (3) and (4) that if N— оо, then ур 
апа 1—Z,, tend toward definite limits which we shall denote by 2, and 1-2, 
respectively. Let us add to the subset the (k--1)th watch. In the game based on 
this new subset the “change” will occur in two subsequent lots in two cases: 


(1) ifin the game based on the subset of the first k watches there was “change” 
and on the (k-++1)-th watch “по change"; because then in the first two subsequent 


lots we have in the subset of the first & watches an odd number of “blacks” and in 


the second lot an even number of “blacks” or vice versa; and on the (k--1)-th we have 
two “blacks” or two “whites;” 


(2) if in the game based on the subset of the first Æ watches there was “no 
change" and on the (k+-1)-th watch there was "change"; because then in both subse- 


quent lots in the subset of the first / watches we have an even or odd number of 


"blacks," and on the (k+-1)-th watch we have the combinations, “black-white or 
“white-black.” 


Taking into consideration “е independence” of the occurrence of “change” 
and “no change" on particular watches we find for the limit of frequency of “change” 


Zi = 201—0.) (1 — 2)дыа. 
Analogously we obtain 1-2, = (1—Z,)(1 =) 4101. 
Tt follows that 221—1 = (1—20,,, (2Z,— 1) 


and by means of full induction 
1—2Z,, = (1—28,)(1—28,)...(1—24,,). 
Ав |1—20,| < À <1, where A is a constant number, we obtain 


lim (1—97) = 0, or lim Z, = 4. 
т— о 


т— 00 


3 14. 
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We can prove now, that the frequency of combinations АА, BB, АВ and ВА in two 
subsequent lots of the game, based on all m watches approaches the limit }, when 


No and m—»oo. First of all we have (AB)ym+(BA)y_, = Хуа: From this follows 
that 


. 


lim ((4В)у,--(ВА),,) = Zm 
®-усо 


` Moreover we have 


lim ((AB)ym+(4A)ym) =(A)m and lim ((В4),,--(АА)уы) = (А) 
N> N>% 


or lim ((АВ)у,-(ВА),,) = 0. 
N>% 
We thus obtain lim (AB)ym = lim (BA)ym = 22,. 
N> N=% 
As we have proved above, that lim Z,, = 4, we obtain 
т—усо 


lim lim (AB), = lim Іш (BA)y, = 4. 


m> Мю m->% N> 


Taking into consideration, that 


(AA)ym+(BB)ymt(4B)ymt(BA)yn = 1, 


it may easily be shown that also 


lim lim (AA)ym = lim lim (BB)ym = 2, 

т— 0 N—»00 m—»90 N>% 
or that two subsequent lots of the game based оп m watches are asymptotically 
“independent” of each other to any degree of approximation if m is sufficiently large. 


4. GENERALISATIONS 


First, we shall deal with the “independence” of the j-th and (j+-a)-th lot. -Let 
abe a definite positive integer. Let us introduce the following notations. The 
frequency of "change" from black to white and vice versa between the j and /--а 
positions of the hand on the k-th watch will be denoted by 9). Then the frequency 
of the hands being both in the lot j and j+a in the white or black sector will be 
equal to 1—d@. Moreover, let us denote by A; and B; the phenomena consisting of 
the occurrence of A and B respectively in the j-th lot. 


- The frequency of occurrence of combinations AA, BB, АВ and BA inj andj+-a 
lots of the game based on the first k watches will be denoted by (4,4,,,) yz: (В.В), 
(А,В, )хь and (ВА, а) уу; and in the game based on all the m watches by (4:4, 4), 
(ВВ) т» (4jBjsa)ym and (B;A;,;)y,. Finally, we shall denote аб; — [а0,] by 0. 
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Theorem 3: Jf |1—эр;| <a <1 and 09 > f > 0, 1-0 > f > 0, 
| 4-0 | > В© > 0 where о and B are constant numbers then 
ue dei (4; Аа) ут = > dm. жы a (By Ва) ут 


Ei lis (4;B на) = lin Ци m БА, ма = Ё. 
m N— 


Proof: From Theorem VI ^ the Appendix ap: e generalisations of 
the characteristics (5) and (6) of the positions of the hands may by derived: 
(5) 9(% approaches a certain limit 0? if Noo, or 
lim m = df. 


Moreover, if |1—2р„| 6а<1 and 09 > f — 0, 1-00 > р® > 0, 
[4-0 | > 2 > 0 where о and f? are constant numbers, then |1—26@| < AM —1 
where A“ is as well a constant number. 

(6) The frequency of the occurrence of “change” (or “no change") in the 
j-th or (j--a)-th lot on the k-th watch is “independent” of the frequency of "change" 
(or “no change”) on other watches. For instance, the limit of frequency of the 
combination of “change” on the k-th watch and of “по change" on the l-th watch 
in the j-th and (j+-a)-th lots amounts to 8@ (1—8). 

On the basis of these characteristics the theorem may be demonstrated in 
the same way as the previous one. A further generalisation concerns the simultaneous 
independence of j+-1, j--2, ..., j--a lots from the j-th lot. 

Lot us denote the least multiple of the integers not larger than а by v and 
10,—[v6,] by 0j. 

Theorem 4: If |1—2p,| < « < 1 and 69 > f > 0, 1—0p > fo > 0, 
|4—0;| > BO > 0 where о and ff? are constant sae then 


ie E (А;А ym = um di (ВВ) ут = ет po (4;В; лт 
= "іш lim (В,4,4)ут = Me ü- —1,2,..., а). 
mo No 
Proof: Tt follows from Theorem 3 that Theorem 4 holds good, when 


conditions (a), (b) and (c) are simultaneously fulfilled 


(а) 20-001 > BO > 0, 
(b) 1—jO, +199] > 8% > 0, 
(c) 1$—36:--Dj04]| > BO > 0, 


where ffo (i = 1, 2,...,@) are constant numbers. We shall prove now, that these 
conditions are fulfilled for BO = i Pr. 


First of all we have 


а)” 2-ШІ > B > 0, 
A: 1—90,--[v0,] > B > 0, 


€) 1800.012 BO > 0. 
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Let us start from condition (a). We have obviously 


10,—[i0,] > 0 
(because 0, is an irrational number). Let us multiply both sides of this inequality 
by the expression v/i, which is an integer, as v is a multiple of i : 
V c. 
v0,— — lið] > 0. 
i 
Аз [00 is the largest integer contained іп v0, we have : 
об: ӨД > 00—100). 


Hence and from (a/) it follows that 
vt [i0] > p. 
Dividing by vi we obtain > 5 Ве > о, 
which proves that condition (а) is fulfilled. 
Let us consider now condition (b). We have 1—i0,--[i0,] > 0. Let us 

multiply both sides of this inequality by vi 

P эб, [0] > 0. 

V V 
Since [v 0-1 is the smallest integer greater than vU, we have 

Sv ВВ > 196 (09. 

Hence and from (b') it follows that 


Y — vbt i liða > B® > 0; 


Dividing by v/i we obtain 1-і НЫ > = po > 0, 
which proves that condition (b) is fulfilled. 
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Finally, for condition (с) the proof is as follows. Multiplying the expression 
]1$—i0,--[i0,]] by v/i we obtain : : 
| | 
v Ol cael 
E mU ial . 
Since either 1--[v6,], or 1+[v4,]-is the multiple of the number j closest to »Ó,, 
we have either 
v © 


эт — 006 TUA |> ГЕЛ 
| 


2p 


2 э e TO] > 1— v6, 19941. 


But according to the assumptions (b') and (с”) the expressions on the right hand of 
both these inequalities are not smaller than /?, which is positive. "Thus 


v 0 г. v 
tS Li] > В > 0 
and after dividing by v/i we obtain 


НОВА > В > 0, 
which proves that condition (с) is fulfilled as well. 
Theorem 4 may be formulated also as follows: 


The result of the watch game is to any degree of approximation “independent” 
of the results of the preceding a lots when N and m are sufficiently large, if 09, 1-09 
and |4—0{| are greater than a certain positive constant, where 0? = v0,—[v0,] 
and v is the least multiple of integers, not greater than a. 


16 should be noticed that the limiting conditions for the set of 0, depend 
ona. The same concerns the magnitude of m necessary for the achievement of a given 
degree of approximation to “independence” of the results of the preceding @ lots. 


As will be seen from the above we may make the results of the watch game 
as close as we wish to a random sequence. First of all we assume any large, but 
definite 4. Let us next choose a set of m of arithmetieally independent irrational 
numbers Фу, es, ..., Фп Satisfying the conditions : 


pelo] > #9, 1—9,4+[ > BO and [$= prti] > д, 


where 9 is a constant positive number, 
points of the hands of m watches are the 
least mu 


Let us assume that the velocities of the 


i numbers 61/0, eso, ..., ?»/v where v is the 
Itiple of integers which are smaller than or equal to а. If m is sufficiently 


large the result of the lot will be "independent" of the results of the preceding 4 lots 
to any degree of approximation. iu 
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It is still worth to consider a particular case of the set of 6, fulfilling the condi- 


tions 
(a^) v0,—[v04] > В) > 0, 
(b^) 1—00,-- [04] > B > 0, 
(©) #—б [| > B > 0. 


It may be proved that these conditions are satisfied when 0), 0, ..., 0, are contained 
in a sufficiently small interval. Let us denote by c the length of this interval. 
We have then |0,—0,| < є. Let e™ be sufficiently small to fulfill conditions: 


(e) ve < v0, —[v6,], 
(f) ve < 1—00,--[00;], 
(g) ve < |3—v6,-- [064] - 


Hence it follows, that : 
|00,—v0,| < 90; —[56;] and [00—001 < 1—16,-H v0.1. 
and in consequence [2011 = [201]. 
We obtain further vO,—[vO,] = v06,—[v0,]4-*(0,— 0), 
1—v0,- [v0] = 1—00,4-[00;]—0(0;—0). 
|3—06,2- [00] | = |$}— 20+ [005] —(0;—0)) |. 
It follows from the condition |0,—0,| < е? that 


0,—[0 > 00—00] — ve, 
1—o0,--[v0j] > 1—v0,4-[00;]—ve'^, 
13— беби > | 3-00, [90.1] ve". 


From conditions (e), (f) and (g) for є) it follows, that the expressions on the 
right hand side of the inequality are positive numbers. Tf we denote the smailest of 
them by 29 we obtain the characteristics (а/), (Һ/) and (e). 


As will be seen from the above, the crowding of velocities 0, within a suffi- 
ciently small interval dependent оп а and a sufficiently large number of m watches 
create conditions, under which the result of a lot is independent of the results of preced- 
ing а lots to any degree of approximation. Tt will be easily seen, that such conditions 
could also be assured by crowding of 0, into a finite number of sufficiently small 
intervals. 


Tt may be of interest to note that the results arrived at in this paper have 
been applied in the work of the Institute of Mathematical Machines, Warsaw. By 
means of a programme for an electronic computer corresponding approximately to 
the mechanism considered here, a quasi-random sequence will be generated which 
is required for the application of the so-called Montecarlo method. 
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Appendix 
Certain characteristics of the sequences of irrational numbers 0, 20, 30, .... 


Definition 1: Let a sequence of real numbers be given 


FQ), f(2), fN). 
Let us denote by N(y) the number of members fulfilling the conditions 
J()—[f()] < У where 0 < y < 1. 
We shall say, that the sequence f has equipartition, if 


У 
Коз бул и 


for all admissible values of у. 
Definition 2: Let there be m sequences 


FO) A), <- ЛО), 
Л), М2), s fN), 
Ма), (2), ..... fo (IN). 


Let us denote by N(y, у», ..., Ун) the number of columns of this set of sequences, 
which for each Ё from 1 to m satisfy the condition 


ЛО) E] < уь 


where 0 < y, < 1. We shall say, that the set of sequences f; f, ...,f, has equi- 
partition, if - 


lim О 57а) у у, Yn 


N—» ` 


for all admissible values of the y's. If the set of sequences f,, f, ..., fm has equipar- 
tition, then each of its subsets has it as well. We ean show it directly by imposing 
condition y = 1 on all f outside the subset. 


Theorem I: Jf 0 is any given irrational number, then the sequence 


0, 20, 30 ..., NO Wes Cb) 
has equipartition. 3 


\ 


This theorem was proved in 1909 by Sierpinski and by Bohl. In 1916 Weyl 
. Offered a simple proof based on a method having a wider application.* 


* Bee 7. Е. Koksma. Diophantische Approximationen, Berlin, 1936, p. 92, 
20. 
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Theorem II: (A particular case of Weyl’s other theorem).* The set of 
sequences 


6, 26, ..., NO, 
Oy; 205... М6, ^ ЖУ (О) 
ў On: ms -< М0, 


has equipartition, if for any given set of integers 7,7...) hmn which are not simultaneously 
equal to zero, the expression 

` habat habat. Himm 
is an irrational number. 


Equivalent of Theorem Ш: If all the numbers бу, 0, ... Am are irrational and 
arithmetically independent, then the set (2) (and each of its subsets) has equipartition. 


From Theorem I we derive the following : 
Theorem III: Let us denote by S the number of terms of the sequence (1), which 
satisfy the condition (а) or (b) : 
(a) 79—10] < у, (--1)0—[0--1)0] > 7; 
(b) 40—09] >y, (3++1)8—[(4++1)8] < у. 
The ratio S/N approaches the limit determined subsequently by formulae (3), (4) 
and (b) when Noo. 


Let us denote 0—[0] by 0. It will be easily seen, that 
19—100] = 56" —1107. 


Therefore in formulae (a) and (b) it is possible to substitute 6’, which is less than one, 
for 0. It may be also assumed that y < } without impinging on the generality of 
the proof. : 

The theorem may be easily demonstrated graphically. Let us measure on the 


circle of a circumference 1 the are of the length j6’ from the zero point clockwise. 
10-07 is then the distance of point /0! from zero. The conditions (a) and (b) mean, 


that should point 70’ lay within the are y, the point (j+1)0’ lays beyond it or 
рыны а 
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vice versa. Without impinging on the generality of the proof it may be assumed, that 
the beginning of the arc у is the zero point. Let us consider three cases 


(A) 6 «y, 
(В) Y < 6 < 1—7, 
(С) a> 1-у, 


to which correspond the figures 1, 2 and 3. Аз we shall prove below the points 
20” laying on the ares marked by a heavy line satisfy the conditions (a) and (b), 
2-9 


7 7 o 2 


1-8 7+1-0' 


І-6%; 
Fig. 1 Fig. 2 Fig. 3 


Let us consider the case (A). Ifthe point j9' lays within the are 0, y, the 
point (7--1)0' lays beyond it then and then only when the point /0” lays on the arc 
7-0%у. If the point 70’ lays within the arc 7, 0, the point (j-+-1)0' lays beyond it 
then and then only when /0” lays on the are 1-0” 0. 


It follows from Theorem I, that in this case 


SENT] 
lim — = 20’. ... 3 
XT (3) 


In the case (B), if the point jo’ lays within the аге 0, y, then the point ()--1)0” lays 
beyond it. If j6'lays within the are у, 0 the point (7-1) lays beyond then and 
then only when the point /0” lays on the аге 1—0', 1—0'--y. In this case we have 


lim 5 = y. ... (4) 


Finally, in the case (C) if the point 70’ lays within the are 0, y. then the point 
(7--1)0” lays beyond it then and then only when the point /0! lays on the arc 
0,1-0. If j0' lays within the arc Y, 0, the point (7--1)0” lays then and then only 
beyond it, when j6' lays on the are y, Y--1—0'. In this case we obtain 


іш 5 = 2—0). 2. (B) 


Thus we have proved, that when No t 


hen the relation of the number S of terms of 
the sequence 0, 20, .. 


-, NO, satisfying the conditions (a) or (b) to the total number 
22 
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of terms approaches the limit defined by the formulac (3), (4) and (5), which we denote 
by 6: 


From the Theorems II and III we deduce the following 


Theorem IV: Let us denote by 81,2. the number of columns of the set of 
sequences (2) satisfying for each k from 1 to m the conditions (a) or (b) : 


(a) 30—001 < у, (2--)0,-10--0Ы > У, 

(b) j0,—L6d > y, (2--)0-І9--004 < v. 
Moreover let us denote the number of terms satisfying conditions (a) or (b) in the 
sequence k by S, and lim (S,/N) by ô We have then 


nn ive pg ae 


This will hold good also for each subset of the set of sequences (2). Let us imagine 
т circles with circumferences equal to 1. On the k-th circle points jô, satisfying 
condition (a) or (b) lay on two definite arcs of joint length ô. From Theorem II it 
follows, that the ratio of the number of these j, for which conditions (a) or (b) are 
satisfied on all circles, i.e. S,,9...,, to М is equal to the product д, д» ..., ди. 

Theorem V: If |1—2y,| < о< 1 and 0; 2 f > 0, 1-б >) >0, 
|4—6,| > 8 > 0 where a, and В are constant numbers, then |1—28;| & A < 1, where 
A is also a constant number. 

Proof: First of all it should be pointed out, that without impinging on the 
generality of the theorem we may assume g j and f <}. For the three cases 
considered in the proof of the Theorem III we have for б, and |1--20,| (assuming like 
in the other proof, that y, < 4) 


-------------------------- 


сазе б, [1—28 | 
(А) 0. < % <4 20, 11—40, s 
(B) y, <% <1-у 2у) |1-4у | 
(0) %<1-у,<0б% 21-9) |1—4(1—4;) | 


In the cases (A) and (C) from conditions 
б6>08<ь 1-6>й0<ь |-6|>й<і 
it follows, as will be easily seen, that 
1-24) < 1—4f < 1. 
In the case (B) it follows from condition |l—2y,| <a and from the inequality 


Ye < $ that 
ук > 1—9). 


23 
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Next it follows from condition |4—@’| > р, that 
either 6<4-2 or 1-%<1-2. 
In addition we have у, < 0; and 1—y, > бр ie. у, < 1—6,. 
г Y: & 1—0. 
Thus we have j1—2) € x < 4—0. 
Hence we obtain for the case y, < 1 
[1—4yg| € 1—4(1—a)|2 = 22—1 < 1. 
and for the case у, > $ 


[1--45,] < 1-48 < 1. 


From this follows that if A is equal to the greater of the two numbers 2—1 and 1— 4/, 
then we have 


[1—2д,| <A<1. 
Theorem VI: Theorems ПТ, IV and V hold good, if conditions (a) and (b) 
lake on a more’ general form 7 
а)” Л < уь (2--4%-І0--а)1 > Yo 
(b) JO AIO) > уь (0-ға)б-(0--а)0 < уһ, 


where а is a positive integer and if in Theorem V 0’ will be replaced by 
O = a0,—[a0,]. 


These theorems may be proved as above by replacing 0) by 69, 
Paper received : December, 1959. | 
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A STUDY OF LARGE SAMPLE TEST CRITERIA THROUGH 
PROPERTIES OF EFFICIENT ESTIMATES 


PART I: TESTS FOR GOODNESS OF FIT AND CONTIGENCY TABLES 


Ву С. RADHAKRISHNA RAO · 
Indian Statistical Institute 


SUMMARY. The existence of m.l: and m.l. equation estimates and their efficiency, in the case 
of sampling from a finite multinomial distribution, have been established under conditions weaker than 
those assumed by earlier writers. For instance, the existence of the second derivatives of the cell probabi- 
lities as functions of the parameters is not necessary. The asymptotic distributions of the chi-square 
goodness of fit test and test criteria for examining composite hypotheses in contingency tables have been 
shown to depend mainly on the efficiency (in tho new sense) of estimates used in the construction of the 
test criteria, "Tho m.l. estimate satisfies the efficiency condition and is, in the sense of second order efficiency, 
more suitable for the construction of test criteria than other efficient estimates, A new test has been 
proposed for examining the expected frequencies computed under two different hypotheses of a special 
nature, 


1. INTRODUCTION 


In two previous papers (Rao, 1960a, 19600), the author gave a new formulation 
of the concept of asymptotic efficiency of estimates of parameters and established an 
optimum property (second order efficiency) of the maximum likelihood (m.l.) estimate. 
This shows that although the class of estimation procedures providing’ efficient esti- 
mates is very wide, the m.l. method is distinguishable from the rest by its maximum 
second order efficiency. It has been observed that m.l. estimates (which are efficient 
in the new sense) provide a good summary of data in large samples in the sense that 
inference based on these estimates is equivalent to that obtained by utilizing the whole 
data, The approach to the problem of estimation and the specific results obtained 
have been based on the concepts introduced by Fisher (1921, 1926). 


Tt is also well known that in the construction of large sample test criteria 
efficient estimates play an important role. For instance, in obtaining a x? goodness 
of fit, the expected values are calculated by inserting efficient estimates such as those 
obtained by maximising likelihood or minimising 3?. The object of the present series . 
of articles is to show how the concept of efficiency in the new sense plays a key role in 
the construction of large sample criteria and in the derivation of their asymptotic 
distributions. It is also suggested that preference should be given to m.l. estimates, 
because of its maximum second order efficiency, among all efficient estimates in the 
construction of large sample criteria. 


Incidentally, the existence of the m.l. estimates, their consistency, efficiency 
and asymptotic normality of distribution have all been deduced, in the case of the 
finite multinomial under conditions weaker than those assumed by Cramér (1946); 
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for instance, without assuming second order differentiability of the cell probabilities 
as functions of unknown parameters. The ү? goodness of fit criterion is shown to have 
the usual asymptotic distribution even under these weaker conditions. 


2. DEFINITIONS AND ASSUMPTIONS 


Let 7, ..., 7, the probabilities іп the k cells of a multinomial distribution, 
depend on a vector Ө = (0), ..., 0,) of parameters belonging to a 4 dimensional real 
space A" or to a nondegenerate interval of E". Иру, ..., p, are the observed propor- 
tions in the k cells for a sample of size n, the log likelihood of Ө is proportional to 


Py log 7, +... + py log л. 


When л; are differentiable functions of Ө, the maximum likelihood (m.l.) equations aro 
defined by 


=yPi Pm _ 0 2 2.1 
2; 27.80, a r TO. m (21) 


The information matrix, which is the variance covariance matrix of z,, is denoted by 
А = ВВ = (i,,), where the matrix В and i, are defined as follows: 


1 дл, A .1 Om дп, 
жеп Mesa PLE Сен Қы Oe CHE e 
(= 20, ) х m 00, 20, 
Unless otherwise stated, the variables 2; and the elements i,, denote the values at 


0; = (6%, ..., 02), the true value of the parameter supposed to be an interior point 
of the admissible region of Ө, such that л;(0,) > 0 for each i. 


We introduce the following assumptions. 


Assumption 1: Every т; has continuous first derivatives 9п;/00, at the 
true value Өү. 


Assumption 2a: Given a д > 0, it is possible to find an € > 0, such that 
К (Ө) 
inf Ez(8)lg^09 >е, ; ... (23 
6-өл>200 0^9 ^ E 
Assumption 2b : (8) zB) for at least one i, when Ө Æ В, which is the 
identifiability condition. 


Г It is easy to see that Assumption 2b ==> Assumption 2a, when the admissible 
ао of Ө is closed and 7;(0) are continuous functions of Ө. If the interval is open, 
it may happen that 


|т,(0,)-т.(6)| -» 0, for each i ав Ө,» В -<0, 
| Assumption 2a prevents such a thing happening and is in a certain sense a strong 
identifiability condition. Г : 
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Assumption 3: There exists an estimate = (Ay; A 6,) such that 


i) ріш 6=6, 
ii) ynd, = w/5(0,—00) has an asymptotic distribution (a.d.). 
ін) The estimate 0 is efficient in the new sense (Rao, 19602, 1960b), i.e., 


plin [o nies ie dic hl = 0,2 nm ч а 2. (24) 
or in matrix notation plim (Z AD) = 0 


where Z and D denote the column vectors of the elements 4/7 2, and 4/л4, respectively. 
We may, in theory, allow 6, itself to depend on the unknown true value Ө,, but it is 
necessary for the application of the results derived that л; = п;(6), the estimate of 
7,(0) should be independent of the assumed true value; 

Assumption 3 may appear as а blanket assumption but is deliberately 
introduced to demonstrate the key role played by the efficiency condition 
(2.4). Whatever may be the method by which 6 is obtained the propositions considered 
in the paper are valid provided only it satisfies the Assumption 3. 

We will, in fact, show in Section 3.5 that Assumptions 1 and 2b imply Assump- 
tion 3 when the rank of Л is 9, and Assumptions 1 and 2a imply the stronger result. 
that m.l. estimates exist and satisfy the conditions of Assumption 3 when the rank of 
А isq. The conditions under which the results relating to existence of m.l. estimates, 
their consistency, efficiency (implying asymptotic normality of distribution) are obtained 
are much weaker than those considered by earlier writers. Tt can be proved that under 
the same or similar less restrictive eonditions other. methods of estimation such as 
minimum x? (Neyman, 1949; Rao, 1955а), modified minimum x? (Neyman, 1949), 
minimum discrepancy (Haldane, 1951), minimum Hellinger distance (Rao, 1960b), 
also provide estimates satisfying the Assumption 3. 


Situations, however, exist where the Assumption 3 is satisfied without even 
the rank of A being 4, 80 that the treatment of the paper is, in some respects, more 
general. | i 


We have not made any assumption about the rank of Л at the true value 6, 
and the proofs adopted are valid whatever may be the rank of A. If the rank is full, 
ie. equal to 4, then the Assumption 3 is equivalent to 


ріш [у (P a 1-0 2. (2.5) 


and the proofs of the propositions considered are extremely simple when (2.5) holds. 
Tt has been shown in (Rao, 1960b) that the rate of convergence in (2.4) is highest 


when б is an m.l. estimate; which indicates some merit in using an m.l estimate in 
preference to any: other efficient estimate. ук : 
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3. CHI-SQUARE GOODNESS OF FIT AND ASSOCIATED PROBLEMS 


3.1. Preliminary lemmas. We shall assemble, to begin with, the key notations 
and their relationships to be used in the lemmas. Тһе transpose of a column vector 
is indicated by a prime. The probabilities 7; and their derivatives are taken at the 

-true value Ө, of the parameter unless stated otherwise. 


(m [ inn, vs Pm 
y = ( vi Vm ‚Уз vr) 


] m—m тұ-т, 
Vind ЛЕ ON ee) 


А ‹ дт 
# = (Vit ony Уныш, So EDO E 


D = (Уп dy, Vandy), d; = 0,—60 


E A. дл В! = Л, the di: ion matrix of 2 and Z = BV. 
B(qxk) = (ж 29, ) so that В. , the dispers Z 


Observe that U ~ B'D, since 
Va беке ЖЕЗ, Мп дт; 1 


ут; Мл; 00, 1 


which follows from the continuity of the derivatives of 7,(0), where ~ stands for 
equivalence of asymptotic distributions. 


By cov(X, Y), where X and Y are vectors, is meant the matrix of covariances 


соу (а, Y1) ... COV (21, у) 


соу (Xm, Yi) ... COV (Tims Yë) 


For instance, by straight computation, we find 
cov (V, 2) = В. 


А Lemma 1: The a.d. (asymptotic distribution) of 2 is q-variate normal with 
dispersion matrix A. 


This result is a consequence of every linear function Z'L,(L being а non- 
random vector) having an asymptotie normal distribution (Wald and Wolfowitz, 
1944). Тһе distribution is. however, singular when the rank of A is not full. 
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Lemma 2a: Under the Assumptions 1 and 3, the a.d. of D' A D is XNg) when 
the rank of A is full. 

Tf the rank of A is 4, then by (2.5) 

D-—A23Z, р Аро 2 А 2. 
The a.d. of Z’ A~ Z is x?(q) in virtue of the result of Lemma 1 and the same is then, 
true of D'A D. к ( ; 

Lemma 2b: Uunder the Assumptions 1 and 3, the a.d. of D'AD is XX), where 
t is the rank of A. 

There exists a matrix C, non-singular such that C A Q' = А, where A is dia- 
gonal and of order q, but with only / elements > 0 and the rest equal to zero. Denoting 
by y the matrix obtained by replacing the non-zero elements of A by their reciprocals 
we find! 


A-— AONO A 
DAD=DACVVICAD 
~ 2 0 уу! 02. (3.1.1) 


The a.d. of D' A D is same as that of sum of squares of linear functions Z’ 0' у} of 
2. The dispersion matrix of Z’ С" viis 
Уд’ = уї A у? =I, 
. where J, is a diagonal matrix with t elements equal to unity and the rest to zero. 
Hence the a.d. of (3.1.1) is x?(t) and so is that of D' A D. 
Lemma 3: Under the Assumptions 1 and 3 the a.d. of U'U = nX(;—7)|m 
is NI) where t = rank Л. 
Since U'U ~ D'BB'D = D' A D, the result follows from Lemma 2a or 2b.: 
. Lemma 4: The Assumptions 1 and З imply plim (V—U)'U = 0 or writing 
out in full 
Lo n X. p.m) m9) — 0. 
T, 
This follows from 
(V—UyU = Y'B'D—-D' ^ D=(%'—D' N)D-0 in probability by (2.4). 
Lemma 5: Under the same conditions as in Lemma 4, a.c.(V—U, О) = 0: 
Since BB’ = A, there exists а matrix @ such that B= Л 6, and therefore, 
U~BD=4 AD~EZ. Consider қ 


а.с. (V—G'Z, GZ) = a.c(V, G' Z)—a.c. (@'7, 92) 


— e(V, Z)G—G' AG 
— B'G—G' A G-— (B'—G' A) G' — 0. 


1 The matrix О’ у C is defined to be a pseudo-inverse of A. The properties of such inverses and their 
use in statistics are discussed by the author (Rao, 1955b). The relationship A = А €' v C А is estab- 
lished by post and pre-multiplying with С’ and C and using the relation CAC’ = А. 
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Lemma 6%: Let X be a p-variate normal variable with mean zero and dispersion 
matric H of rank f<p. If Q(X) and 0,Х) are two quadratic forms such that 
[Qr(X)+Q2(X)]is x? (с), Q(X) is x2(b) and Q(X) із non-negative, then Q,(X)is x? (c—b). 

Transform X to W such that w,,...,w; are each independently N(0, 1) and 
Шы»: Wp are all N(0, 0). The transformed quadratic forms 0, (W), Q, (W) are, 
therefore, equivalent, with probability one, to those obtained by omitting w,,1, ..., Wp. 
The problem reduces to the case of G'— I and p = f. ; 

Consider an orthogonal transformation from Х to Y such that 

ХХ = ГУ and Q,(X)-+Q,(X) = 524. 
Since Q,(X)+-@,(X) is (с), it follows that A; = 1, < сапа zero otherwise. Hence 
ут... +43 = О(Ү)+0(Ү) oC ... (3.1.2) 
Since Q,( Y) is non-negative and Q,( Y) is non-negative being distributed аз x°, Q,( Y) and 
0.(Ү) contain only the variables y;, ..., Y The problem is thus reduced to the case 
of c independent normal variables, each with zero mean and unit variance, together 
with the condition (3.1.2). Under such conditions, given that Q,(Y) is x?(b), it 
follows that Q,(Y) is y? (с—0), as may be proved by considering an orthogonal 
transformation from Y to Z such that 
'Xyb— X and О(У)---...--2. 
3.2. The goodness of fit test. 
Theorem 1: Under the Assumptions 1 and 3, the a.d. of 
MY n(p,—m.)* 
т, 

is x*(k—t—1), where t = rank Л. 

Consider 


x" my Ў тр, п, ау % n(1,—1,) 
T, 


+ product terms 
т, 


7, 
or in matrix notation, 

Y'Y —(V—UY(V—U)--U'U ... (3.2.1) 
since the product term—0 in probability by Lemma 4. Тһе a.d. of V’V is XXk—1) 
and that of U'U is x°(t) by Lemma 3. Hence by an application of Lemma 6, (V —U)' 
(V—U) is yk—1—1). Alternatively, since а.с. (V—U,U)— 0 by Lemma 5, 
(V—U)(V—U) and U'U are asymptotically independently distributed. Hence 


the a.d. of (V—U)(V—U) is x*(k—1—t). In order to prove the result of Theorem 
1 we observe that i 


У т(ру-т)) A У тр,-т,) 
л, т, 


: because of continuity of z,(0). 


: * This lemma. was proved and included at, the suggestion of my colleague, Dr. S. К: Mitra, For 
proving Theorem 1, зе need either Lemma 5 or Lemma 6. 
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General remarks on the result of Theorem 1. The decomposition of the total 
X? in (3.2.1) deserves some comments. 1f the observed frequency in any cell is indi- 
cated by O — np, the hypothetical frequency by H = n and the estimated (or expected) 
by = пт, the equation (3.2.1) may be written 


0—Hy 


( 2. (0-Е, у G—Hy 
zt sC > 


E (3.2.2) 
XXE—1) = XXE —t—1)-2-x*(0). 

For an application of the x? goodness of fit test, it 18 necessary to enquire whether 
the rank of A is constant over the admissible set of parameters, since the true value 
of the parameter @ is unknown. What is asserted by Theorem 1 is that the degrees 
of freedom of a.d. is dependent only on the rank of A at the true value, although the 
rank of A may not be same at all points. 

The statistic X(2—H)?/H may be used to test the hypothesis that a particular 
set of probabilities is true given that the admissible set has the representation z;(0). 

3.3. Test for deviations in any particular set of cells. Cochran (1954), and Rao 
and Chakravarty (1956) considered problems where attention is concentrated 
on the deviation of the observed from the expected frequency in a particular cell or 
the deviations in a particular set of cells. Tests for examining the singificance of such 
deviations are extremely useful as they may lead to a suitable explanation of the 
departure from hypothesis when. indicated by a large value of the x? goodness of fit 
test. The result of the following lemma will be useful in computing the variances 
and covariances of the deviations. ' 

From Lemma 5 we have 

а.с. (V—U, О) = 0 => a.c.(V, U) = а.с. (U, U). 
Hence а.с. (V—U, V—U) = е. (V, V)—a.c. (0, 0) 
—e(V, V)—G' A G where G is as defined in Lemma 'j 
3.3.1 
= (V, V)—B' A+B when rank A = 4. | 

We have, thus, very simple formulae for finding the asymptotic’ variances and covari- 
ances of the deviations ( р.п»). For instance, to test the departure in the r-th cell we 
need 


av [Vapa] = vL, mna Lv nim m2] 
= папам nomm). 
When the rank of А is 4, the last expression can be evaluated in terms of A+ = (i) 
which represents the asymptotic variances and covariances of the estimates 0, 


Om, а, 


aS и On, дт, 
атут, —п)] = 2 Ў 706, 799, % 


The deviation ф-т, can be tested using its standard error. 
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If the deviations in a number of cells have to be tested simultaneously we have 
to compute the variance-covariance matrix of the deviations by using the result 
(3.3.1). With the inverse ‘of the matrix so computed we can construct a quadratic 
form in the deviations, which has a x? distribution with d.f. equal to the number of 
deviations to be examined. It may, however, happen that the rank of the variance- 
covariance matrix of the deviations is not full, in which case the significance of all the 
deviations cannot be examined ‘simultaneously. We consider only those deviations 
for which the dispersion matrix remains non-singular. 


3.4. Test for the hypothesis that the parameters belong to a subset. We need 
some further assumptions to consider the problem of testing whether the parameter 
6 belongs to a subset assuming the specification 7,(8) to be true. 


Assumption 4: The locus of @ in the subset is represented by 
б = gie, «++, Oy), =, 


where g; admit continuous first derivatives and (a, ..., %,) is confined to R” or to 
some non-degenerate interval in А’. 


Assumption 5: There exist consistent estimates 21, having asymptotic distri- 
butions such that - 


plim 4/n [y,—ju(o3 —a1)— ...—Ju(85 —25)] = 0, и = 1,...,г 
д 
where 5, = ба; У p; log ль, and (j,,,) is the information matrix for the parameters 
о Olpe 


Assumption 5 із same as Assumption 3 in terms of the new parameters. 
Lemma 7: The Assumptions 1, 3, 4and 6 imply 
plim n X (т„—л)(л—л„) | 0. 
Пи 


By the same argument ав іп Lemma 4 we have 


plim na (р,—пи)(пь—пь) zo 


T, 
and, therefore, Lemma 8 is true if 
plim s X текш =@ 2. (841) 
“ 
Since утат) ~ үл» Om, (00—05), ti it i 
да, (212250; © prove (3.4.1) it is enough to show 
на М Mey) ч 
р > * ‘а, 9% т.) Т 0. 227 (8.4.2) 
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The left hand side of (3.4.2) is equal to 


р 0g; с Ут дт A 
lim x 9 x V^ 9m т,). 
PA ады > Ty 20,9 wm) 


= в Ая My („бя 
> "Qa; plim 2 Tu 20, (Pu Ty) 
eb 20 plim аа айй) = 0 Бу (24): 


Theorem 2: Under the Assumptions 1, 3, 4 and 5, the ad. of 


n Xe 
т 
is x?(t—a), where t= rank A and а = rank (jrs)- 
- "The result follows from the decomposition 
2 x inn 1) (пат)? уу (та)? 2. (844) 
Tm Tm Пт 

where the left hand side is asymptotically д2(0) by Theorem 1, and by Lemma 3, the 
second term on the right hand side is asymptotically x(a). Lemma 7 justifies the 
expansion (3.4.4) and finally an application of Lemma 6 gives the desired result. 


General comments on the result of Theorem 2. Tf we represent by E, = 77, 
expected or estimated frequency under the second hypothesis (that the parameter 
0 is confined to a subset) the test criterion is 


x UE ... (8.4.5) 
2 
where it may be noted that Æ, is written for Æ, the expected frequency under the first 
specification. Combining the decompositions (3.2.2) and (3.4.4) we have 


Ово ues (®,—Е„}# | у aH : 

= FOSY T, МЕ Е, + ES s. (3.4.0) 
Some years ago, the author (Rao, 1948) suggested a general criterion from which several 
large sample tests having asymptotic y? distribution were deduced as special cases. 
The test (3.4.5) is, however, different from the previous test. Under the null hypothesis 
both are asymptotically equivalent and so also other tests proposed for the purpose 
(Mitra, 1955; Rao and Chakravarty, 1956). But differences may exist in their 
relative efficiencies in finite samples. 

3.5. Sufficient conditions for the validity of the Assumption 3. The Assumption 
3 ог 5 as stated may be difficult to verify and it may, therefore, be useful to have 
some simple conditions under which they are true. The following lemmas provide 
some answers. 
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Lemma 8: The Assumptions 1 and 2a, together with the non-singularity of A 
imply the Assumption 3. 

We first consider the set 8 of 6, such that 


У [л4(Ө,)-Еєу] log 7,(8;) > Х [7:(0,)—е1] log 7,8) 


where в; > 0 is fixed to make z,(0)) —6; > 0. For р; such that т.(0%)-в, < p; < п; (05) 
+6, where 6; < ву, 


Х p; log m (8) > X [m(8))--e,] log (8) > У [7,(6)) —6;] log л{(@) > X p; log 7,8). 
For 6 outside S, and (0—0,)° > ô, log 17,(6,)/7,(0)) are bounded in which case 
> GAC) 
У жов е) 
can be made uniformly close (within є difference) to 
77, (8) 
У т.(0,) log 77,0) 
by choosing р; sufficiently close to 7(8,), say л,(6))-в, < р; < 7,(0))--6. Since by 
Assumption 2 ` 


К 6) 
inf Хт/6,) log T >в>0 
6-8л>9  ' m9 

Ti 


У рг log E > 0 for all 6 such that (8—8)? > à (3.5.1) 
Li 


when 7,(8))—e, < p, < 7,0))--6,, where с, is smaller of еу апа єз. 


Since л;(0,) > 0, and 7,(0) are continuous, the result (3.5.1) shows that the 
supremum of У р, log z,(0) is attained in the open interval (0—6,)* < д. As д сап be 
chosen arbitrarily small, and Pi is close to т.(6,) with probability one, the value Ө at 
which the supremum is attained provides a consistent estimate of @. So far we have 
- used only Assumption 2 and continuity of т.(0). 


If 7,(8) are differentiable, the derivative of X р; log 7,(8) vanishes at 0. This 
shows that the m.]. equation has at least one root which maximises the likelihood 
and which provides a consistent estimate. , 


We shall use the condition that rank А is q, and the continuity of the deriva- 
tives, to prove Assumption 3. Тһе rank of the matrix A, however, does not play a 
significant role in the proof. It is felt that a weaker condition than this may be 
sufficient for this purpose. 


The m.l. equation is 


cov ac near en uer 
п, 00, 


“or Р 9 = y Vip-m) 2 утуп) дл, 
л, 90, 2 20, үө 00, 


È 


(3.5.2) 
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The first term on the right hand side of (3.5.2) is ~ 4/nz, Using the expansion 


nmm) = z( a ey ) Vnd, where e, 0 as 6-» 8, 


the second term on the right hand side of (3.5.2) becomes 
Мп X ja d, (3.5.8) 


where V pope ыш. 


= ipten 640 as 0— 0, ‚.. (8.5.4) 
— i, as 6— ө). 


We may write the equation (3.5.2) as the equivalence 
Vn 2, ~ X ju V/ dy КТК 359 


By the assumption that || == 0, when 0 is sufficiently close to 8, |.| 52 0 in virtue 
of (8.5.4). Therefore, the equivalence (3.5.5) may be written 


n d, ~ Xi jns (9) = Ges) ... (8.5.6) 


~ Dity/nz as "> i in probability ... (8.5.7) 
which shows that the a.d. of y/n dy ..., V^ d, is multivariate normal with dispersion 
matrix A+. Inverting the relation (3.5.7) we have i 


М eX ѓул, = V’ X is d, 


which is (iii) of the Assumption 3. 

The Assumption 2 used in Lemma 8 is important ая it specifies the condition 
under which we can assert that the m.l. estimate has the properties mentioned in 
Assumption 3. But if merely the existence of estimates satisfying Assumption 3 
has to be established, Assumption 2 may be replaced by the weaker identifiability 
condition. 

Lemma 9: Assumptions 1 and 2b imply Assumption 3. 

We consider a sphere of radius ô round 8). Since 

. ;(8,) 

nf X пиво) log 74% AE: 

i (6,) log 7,0) (3.5.8) 
over the sphere is attained, because of continuity of 7 (0), at some point on the sphere 
the identifiability Assumption 2b ensures that the expression (3.5.8) is greater than 
€ 0. The argument of Lemma 8 applied to points over the chosen sphere shows 
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that the likelihood at @, exceeds the likelihood of any point on the sphere with probabi- 
lity unity as n— оо. This shows that a local maximum of the likelihood is attained 
at some point inside the sphere. The first derivative of log likelihood vanishes at that 
point. Hence there exists a root of the m.l. equation which is consistent. The rest 
of the argument is as in Lemma 8. 


3.6. Goodness of fit tests when the Assumption З is not satisfied. If the estimated 
parameters do not satisfy the Assumption 3 the statistic X(O—E)*/E need not have а 
X» distribution. But in some cases, as when the method of moments is followed one 
may construct an alternative statistic with its a.d. аз ү? on the same d.f. as the goodness 
of fit ү?. 


Let the parameters be estimated by the method of moments i.e., by equating 
the observed moments of the grouped distribution to the hypothetical values for 
the grouped distribution. As the estimating equations are linear in the frequencies, 
the estimated deviations 


Pi es DET ... (3.6.1) 


are subject to as many linear restrictions as there are independent parameters, besides 
their sum being zero. If the matrix of derivatives of 7; has rank 4, the asymptotic 
dispersion matrix of the deviations will have rank (k—9—1). We need only choose 
(k—q—1) independent deviations and using the reciprocal of their dispersion matrix 
construct the quadratic form. This has X? distribution with (k—q—1) d.f. Тһе 
asymptotie dispersion matrix of the deviations сап be easily computed by the usual 
methods. 


We can also test the significance of deviations in any particular cell or devia- 
tions in a particular set of cells as in Section 3.3 using the asymptotic variances and 
covariances of the deviations (3.6.1). 


There is another situation where the parameters are estimated by an efficient 
method utilizing the individual observations and not simply the observed frequencies 
in certain class intervals. In such a case the statistic ХО--ЕУЧЕ, when'used as 
x*(k—q—1) over estimates significance. The extent to which this happens has been 
studied by Chernoff and Lehmann (1954). An alternative expression is given for the 
excess in terms of the difference between estimates obtained in two different ways. 

Let us indicate by 0%, the estimate of Ө, and by л* that of 7, obtained from the 
original observations by an efficient method of estimation such as the m.l. Тһе 
information matrix for а single observation is denoted by ( Jrs), reserving (4,,) for the 
grouped distribution. Let y, — п (д log L[00,), where L = f(z, 0) f (xs, Ө)... f(x, 0) 


A f(x, 9) is differentiable. We make the following assumption regarding the estimate 
ж 


Assumption 6: 
plim Уа (01—01) —...—),(0;—609)] 250,82 1, ..., 4. 
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Lemma 11: Under the Assumptions 3 and 6 
i) av. Valp Th) = vIvn(pe—m)]—8- Ln m2] 
й) ae[Vn(pi—m). Vapi 
= o [yalpi n), Valp) acvarii), valm; —n;)] 


Е (р-тХт-т) — о, 
` Т, 
The results (i) and (ii) are immediate consequences of the equations 

а.с. [л (р„—л;),ут(л;—т,)] = 0 for all r and s ... (3.6.2) 

which are true if t 
а.с. [Valp mVn Ys] = 0 for all r and s. ... (3.6.3) 

Let us assume for simplicity that the ranks of (jrs) and (i) are both equal to 4, the 
number of parameters. To prove (3.6.3) we have to show that 


а.с. [vnl p:n) ут pA = а.с. [vnin —т,), Мз UAE 


T o6 (obtained by expanding л; in terms of yi) 22. (8.6.4) 
& 
Since E(p,) = Tr Le., 
| p, Қау)... Г). de, = т, 
differentiating with respect to 0,, we obtain 
дп, 


3log L i $ М | 
fot 28, dz, ... da, = | (Упр (Vn уд dm. day = л + (8.6.5) 


Comparing (3.6.4) and (3.6.5) we find (3.6.3) is true. Further ут, п) ~ ут 


> one (9,—0:). Hence result (iii) of the lemma follows if 
8 


plim va E Ae (pcm -0 
which is true in virtue of (3.4.3). 

The results of Lemma 11 are important in many ways. For instance the 
significance of the deviations in any given set of cells can be tested, although the esti- 
mates are not obtained from the grouped distribution, since the variances and covari- 
ances of the deviations can be easily computed by using the results (i) and (ii). Thus 


*\] — Newly an On, Оп, sst 
a.v. [y/n(p,—7:) ҮТ? (1 Ty) хх 00, 720,7 


ууз ВИ . дл, ОПи ss 
во, Ир), уар п) = —лл„—® Se ag 
It may be observed that the dispersion matrix of all the deviations (p;—7;)may have 
the maximum rank (0—1), so that a y? based on (k—1) degrees of freedom may be 
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constructed to test goodness of fit. But this is not likely to be an efficient test. On 
the other hand the statistic 

—m*)? 
пу (р; т) 
т 


need not have a x? distribution asymptotically as shown below. We use the result 
(iii) of Lemma 11 to obtain the decomposition 


n= (р,—т)* жал (рат)? +» (пт) { 

л 7; т; 
The first term on the right hand side is asymptotically y*(k—q—1) as shown in 
Theorem 1. Тһе second term which is asymptotically independent of the first provides 
the excess and depends on the numerical difference between estimates obtained in 
two ways. 


4, TESTS IN CONTINGENCY TABLES 


Problems associated with a single multinomial distribution have been fully 
discussed in the earlier sections. Another group of problems, which is important in 
practice, is related to contingency tables, or independent samples from a number of 
multinomial distributions. Let us suppose that we have samples of sizes ny, ..., nm 
(m+...+2, = n) from m finite multinomial populations, all not necessarily having 
an equal number of cells. One of the problems is to test the hypothesis that the cell 
probabilities could be represented in terms of q parameters, 0;, ..., 0. Тһе likelihood 
of the parameters ін the product of the likelihoods 


ТАҒА». La 
corresponding to the m samples. Let us define 


1 дю, 
ni 00; 


Зи == 


and (i£) is the information matrix per single observation from the ¢-th multinomial 
distribution. We make the following assumptions. 


Assumption 7: The rank of (g,,) is q, where Ir = Xx. 


Assumption 8: Every cell probability z;; as a function of the parameters 
admits continuous first order differential coefficients. 


Assumption 9: If UPS Tip, are the probabilities for the i-th multinomial 
then 


É 
У mil) lo Tiy (8o) 
REO D) 
is bounded away from zero when (8—8,)? > $, however small, for each $. 
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Following the arguments of Lemma 8 it can be shown that there exist consis- 


tent estimates б, x 6, having asymptotic distributions, such that 


plim yE Аг—91001—04)— ... 0001—09) = 0 
for j = 1,...,m and fixed Àj, sss A» We шау now prove the following in the same 
way as Theorems 1 and 2. 


Theorem 3: Under the Assumptions 7, 8 and 9 the asymptotic distribution of 
the statistic 


g Nm(py-m _ y y(0—E* 
i=1 ізі Tij E 
is y (Zk; —m—q). ` 

Ме use the decomposition 


хх Pym „у y рут). уу nm may Б (АЛ) 
Tij л) Tij 
where the left hand side is asymptotically д? (E k;—m) and the last term in (4.1) is 
asymptotically X*(g). 


Theorem 4: Under the Assumptions 7, 8 and 9 and the further assumptions 
that (i) 0, ..., 0, сат be represented as functions of t < 4 parameters o, ..., 04; (ii) every 
0, admits first partial derivatives in æj, which are continuous and (iii) а condition similar 
to Aswmption 9 in terms of æi is satisfied, the a.d. of 

жж MAE у у (Е.Е)? 

ij л E, 
is 12(9—1), where пу; denotes the estimate of т; аз a function of parameters &;, and Hy, Ез 
are the expectations under the original and new specifications. 


'To prove the result we first obtain the decomposition 


T, Т, 


| х f 
y g Tonm) Ly x Tiz m | y y iem 
із ij Tij dj 


SSQU—Hy уу (Е.Е) уу (Ё„—Н)* 
or ped хх б) + eee 
and proceed as in Theorem 2. 
Theorems 3 and 4 concerning several multinomial. distributions cover the 
` multidimensional contingency tables in so far as hypotheses specifying the cell probabi- 
lities are concerned. 

As an application let us consider the phenotypic frequencies of O, A, B, AB, 
blood groups in two samples of individuals from two communities. The first hypothesis 
is that for each community the frequencies are consistent with Bernstein’s theory. 
There are then four parameters, фу, 01, representing the A and В gene frequencies in 
one community and ps, gə; іп the second. The second hypothesis specifies further 
that the gene frequencies in the two communities are same, so that all the cell 
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probabilities involve only two parameters р,4. For estimating these parameters by 
the m.l. method reference may be made to Rao (1952). 


a 


community 1 community 2 
lhenotype 
s ae о Ey Ез о E, E: 
2 Y. NO 
[^] 121 118.89 117.94 118 119.93 121.62 
А 120 122.44 105.52 95 92.68 108.81 
B 79 81.54 98.14 121 118.74 101.19 
AB 33 30.13 31.40 30 32.65 32.37 


RUND АДЕ Wy ec eta але ru Li эшак р а. 
The test for the first specification (consistency with Bernstein’s theory) is 


; um. 0.44 forcommunity1, d.f. = 1 
1 
-- 0.35 for T 2 ЧЕТ 


or a total of 0.79, which is small for y2(2). То test the equality of gene frequencies 
the statistic is 


—£#,)? 
хх я = 11.04, with 4—2 = 2 df. 
2 

This is significant at 1% level, indicating differences in the gene frequencies. In 
a previous paper (Rao, 1948), the author examined the second hypothesis by using 
a different large sample test. The present test based on the difference between ex- 
pected values under the two hypotheses seems to be more attractive in practice. 
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А METHOD ОЕ FRACTILE GRAPHICAL ANALYSIS* 


By Р. С. MAHALANOBIS 
Indian Statistical Institute 


Fractile Graphical Analysis is a new method of statistical analysis which, provides an effective 
summary of information particularly useful in situations where the data do not permit a description 
in terms of a few parameters relating to the distribution; it also provides a graphical way of 
testing differences between groups. This method can be used for any variate which can be ranked. In 
this paper, the use of this method is illustrated for the comparison of economic data—relating either to 
the same population at different points of time or to different populations—by means of examples taken 
from the Indian National Sample Survey. 


This paper provides some examples of the use of fractile graphical analysis, 
a new method for the comparison of economic data relating to the same population 
over time or to any two populations that differ as to geographical region or in any 
other way. This method can be used for any variate that can be ranked, and is based 
on certain theoretical conjectures. ^ Asymptotic, but not exact proofs are available 
for some of these conjectures and results of model sampling experiments have been 
found to be in accordance with them. 


1. THE METHOD 


1.1. In the National Sample Survey of India much economie and demographic 
data are collected every year in the successive “rounds” of a survey, each “round” 
extending over several months. The data are usually tabulated separately for the 
different States (which constitute the Union of India) or for groups of States, for India 
as a whole and with breakdowns, or in some cases, for rural and urban areas. Consider, 
for example, surveys of household budgets. The total or per capita consumption 
expenditure for 30 days (or any given reference period) and also the per capita expendi- 
ture on, say, foodgrains, all items of food, drugs, or cloth, etc., would be reported 
for each sample household. As a probability sample is used in each case, it is possible 
to estimate any of the characteristics for the whole population. In this way, infor- 
mation is available on the distribution of households by size of total or per capita 
consumption expenditure, or of the expenditure on individual. items, for each size 
class, for the different States and for India, and over time from round to round of the 
National Sample Survey. The question naturally arises whether the pattern of con- 
sumption is or is not the same from one State to another; is or is not steady over time 
from one round of the survey to another, for the same State or for the whole of India, 


* This paper originally appeared in Econometrica, Vol. 28, 2 (April 1960) and is being reprinted 
with the permission of the Editor, Econometrica. 
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1.2, The design of the survey is always such that Inter-Penetrating Samples 
(IPS) are used in which two or more independent samples are drawn in exactly the 
ваше way—according to the design of the survey and with replacement. These two 
or more interpenetrating samples are statistically equivalent and supply independent 
and equally valid estimates of all population characteristics. For convenience these 
two or more samples may be called sub-samples, emphasizing the fact that they 
can be pooled together to give one combined sample to supply valid estimates of 
population characteristics based on all of the information. 


1.3. Consider a sample from a given bivariate population, and let 2 represent 
the per capita consumption expenditure and y the per capita expenditure on foodgrains 
for a sample household. Assume that the size of each sub-sample is the same and 
consists of N households each.! Also assume that each sample household has the same 
chance of being included in each sub-sample. Then all sample households would 
have the same probability weight for purposes of estimating the population charac- 
teristics. Тһе processing of the data in this case is extremely simple. Consider the 
first sub-sample of № households; rank the households in ascending order of the 
«-variate (in this case, the per capita expenditure of the household in the given reference 
period)? Now divide the М sample units into g equal groups of n each, so that 
N — gn. Next, find the average value of the y variate, for example, the per capita 
expenditure on foodgrains for each of the g groups. These may be called 
Yur» Улэ» Yiz s Yy Now take g equidistant points? on the z-axis and plot the у 
values, that is, Ji; Jis Vip :-:: Уш ON these g points x = 1,2,...,g. Finally, join the 
successive points by straight lines. This would give a graph which may be called 
G(1) for the first sub-sample. 


1.4. The procedure would be slightly more complicated when the sample 
units are selected with varying probabilities of being included in the sample, and, 
therefore, һауе varying probability weights for purposes of estimating the population 
characteristics. Consider the case in which a, is the per capita total expenditure 
and Yi is the per capita expenditure on foodgrains for the i-th sample household. Let 
w; be the probability weight of this household for estimating the population charac- 
teristics; also, let the w;'s be adjusted in such a way that the total probability weight, 
that is, the sum of all values of их, equals опе. The first stage of processing, namely, 
ranking the sample households in ascending values of x;, remains the same; the next 
stage is to multiply y; (and other similar variates if there are more than one) by w;; 


1 This can be arranged b. specification in the design of the surve ‘this i i 

р nge у Spi И гуеу, but this is not essential. Ono 
: e 1 hi gn е surve; 8 

" eat tae of the fracti e P mothod'is that the procedure would remain the same even when. 


. . ^ This can be done by hand cr very quickly with punched cards by a sorter. 


-. 3 These points would be, of course, centred at the midpoin: 9 groups. is, how 

8 Ў А t of each of the О! * It is, how- 

ever, not necessary that the z- ints should be equidistant; other scales can be used, for STAEN o У alues 
» гу a q H uf З iple, уай 


suitable distribution may also У 5 group would be equal for a normal distribution, Any other 


> TIt is possible to use other rules i i i 
three or more successive Оры) атс (for example, graduating parabolas passing through 
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and then to form the accumulated sums of both w, and wiy: for all sample households.’ 
Tt is now possible to form any desired number of fractile groups, say, g, each having 
an equal proportion of the estimated total population of households. For example, 
if g = 20, ali that has to be done is to divide the sub-sample at successive five per cent 
points into groups on the basis of the accumulated sums of w; It is also easy to obtain 
the estimated average value of any population characteristic y for each of the fractile 
groups from the corresponding accumulated sums of wy; These average values of 
y are then plotted against g equidistant points on the z-axis, and successive plotted 
points are joined by straight lines to supply the graph G(1). 

1.5. Consider next the second sub-sample, also of N sample households. 
Rank them by ascending values of «; divide into g equal fractile groups; find the average 
values Yas Уш» «++ Ув ОҒ the y variate for each of the g groups; plot these points on 
the same chart and on the same g equidistant points as used for the first graph, G(1); 
and join the successive points by straight lines. This would give а second graph, 
G(2), for the second sub-sample. 


1.6. The-next step is to pool the two sub-samples together to form a single 
sample of size 2N. Rank the sample units again by ascending values, of 2; group 
them into g equal groups each of size 2n; find the average values of y, that is, 
Yas Ya.» Уу for each of the g groups; plot them on the same chart and оп the 
same g equidistant points; and join the successive points by straight lines. This 
would give the combined . graph which may be called G(l, 2) for the combined 
sample, This completes the construction for the given population.® 

1.7. I shall now make some. assumptions which seem plausible. Consider 
the area bounded by the two sub-sample graphs G(1) апа 6(2). (This can be lightly 
shaded in the chart to make it distinct). We may use this area as а convenient 
measure of “error”? to be associated with the combined graph G(1, 2). 


5 If punched cards are used, this сап easily be done by running them through a tabulator and 
printing the accumulated subtotals for the desired fractile groups, or for fixed ranges of the 2 variate. 
convenient plan is to use twenty 5 per cent groups together with five 1 per cent groups at both the bottom 
and the top, giving thirty groups altogether, 


hole of the information. ‘The three graphs G(1) (2) and G(1,2), would also contain the informa- 
cda КОНЕ КИЕ that the accuracy with which the information could be recovered would depend on the 


recovered from such functions with a margin of uncertainty which would depend on the "goodness of fit” 
of the graduating functions and the margins of error given. by the sub-samples. 

Let F(1), F(2) and F(1,2) be the three graduating functions of the same specification which are 
fitted to the data for the two sub-samples and for the combined sample, respectively. "These three curves 
can be easily plotted on the chart. ‘Then the area bounded by the two graduating functions for the two 
sub-samples (1) and Ғ(2) would supply в measure of the error to be associated with the graduating funo- 
tion for the combined sample. Alternatively, the deviation from the graduating function can be measured 


(that is, the deviation from the graduating function) are less than the margin of error given by the area 
hounded by the two sub-samples, it would then be possible to recover from the graduating function almost 
the whole of whatever information oan validly be ‘used for statistical purposes relating to the estimated 


43 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : бентев А 


1.8. Now consider a second population and assume that a pair of inter- 
penetrating samples have been taken from it with the same sample size of N house- 
holds in each sub-sample. It is then possible to go through the same construction 
as in the case of the pair of sub-samples from the first population. Let the correspond- 
ing graphs for the second population be called G'(1), G'(2) and G'(1, 2). The area 
between the sub-sample graphs 6/(1) and G'(2) would then give the corresponding 
“error’’® associated with the combined graph G'(l,2) for the second population. 

1.9. It is now possible to go one step further. Тһе area between G1, 2) 
and G'(l, 2) may be used as a measure of the "separation," or over-all difference or 
generalised distance, between the two populations. It is plausible to assume that th: 
“error” to be associated with the “separation” can be derived in the usual way? from 
the two component errors associated respectively with each of the combined graphs 
G(1,2) and G'(1,2). It is also possible to consider not only the total separation bu: 
the separation between any partieular portions of the two combined graphs G(l, 2 
and G1, 2), that is, the area bounded by these two graphs and the correspondin 
ordinates limiting any assigned fractile group. Each such partial separation would 
also have its associated error which can be derived from the two component error 
lying between the same two ordinates. This furnishes a convenient tool for the 
comparison, analysis, and the testing of significance of the separation between two 
populations either over the whole or any part of the range of observations. 


2, ILLUSTRATIONS FROM THE NATIONAL SAMPLE SURVEY OF INDIA 


2.1. ‘Some actual examples from the National Sample Survey of India may 
now be considered. In the 7th round (October, 1953 to March, 1954) information on 
household consumption was collected in the form of two interpenetrating samples 
7 of 702 and 711 sample households from 476 and 478 mauzas or villages, respectively, 

extending over the whole of rural India (excluding Jammu and Kashmir). Іп another 

- survey carried out in the 9th round (May to November, 1955) similar information wes 
collected from 768 households, one each from 768 mauzas (out of 772 mauzas or villages 
of which four were uninhabited)! in each of the two sub-samples. The period of 
reference was 30 days in each of the surveys. It is possible to use the per capita 

| expenditure on all items of consumption as the x variate. Тһе sample households 
were ranked (separately for each of the two sub samples and for the combined sample) 
in ascending order of the x variate (per capita consumption expenditure). The design 
of the sample surveys was with varying probabilities, and appropriate probability 
._ 8 Although sub-samples have been assumed to be of equal sizes, this is not necessary. It js also 
Hosen me e ue Air riget ааа 


be the sizes of the two sub-samples from the second population. Then whether №: = №. = №, = № 


(as assumed above), or whether they are all different із entirely i i i 
the two “errors” would be given i is entirely immaterial. The graphical measure of 
sub-sample graphs 641) pe Qe ou case by the two areas lying respectively between each pair of 


(1) and 62). . 
9 That i Е 
(1,2) and Gs 15 the square root of the sum of the squares of the errors of the two combined graphs 


н . 10 The word "village" is used for the Indian revenue unit i г 
Е и ше mauza which broadly corresponds to 
land continues to_be demarcated in укка of ths, ВЫ ео, 
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weights were used for estimating the population characteristics; The estimated 
number of housvholds and other characteristics of the population were obtained in the 
usual way, and twenty equal fractile groups, each containing 5 per cent of the house- 
holds, were formed. The bottom and top 5 per cent groups were subdivided into 
1 per cent groups (1, 2, 3, 4, 5 and 96, 97, 98, 99, 100 percentiles), giving 30 groups 
altogether. Some additional fractile groups were also formed." 

9.2. Tables I and 1I give the information for the 7th and 9th rounds respec- 
tively.2 In each table, columns (2), (3) and (4) give the limiting value of the per 
capita expenditure at the upper end of the fractile group for (respectively) sub-sample 
1 (which would be written in the contracted form s.s. 1), sub-sample 2 (s.s.2) and the 
combined sample. For example, in Table I for the botton 1 per cent group, the 
limiting values of per capita expenditure are Rs. 2.60 and Rs. 3.25 in the two 
sub-samples and Rs. 3.00 in the combined sample. Columns (5), (6), and (7) in both 
the tables give the average per capita consumer expenditure in each fractile group. 
Tt is known that prices of consumer goods, especially of foodgrains, had greatly decreased 
at the time of the 9th round. The effect of this fall in prices can be seen from these 
two tables. Consider the fractile group between 70 per cent and 75 per cent of 
households; the lower and upper limiting values of per capita expenditure in the 7th 
round are, respectively, Rs. 20.00 and Вв. 22.00. № should be noticed, however, 
that the fractile group between 75 per cent and 80 per cent in the 9th round has approxi- 
mately the same limiting values—Rs. 20.02 and Rs. 22.11. If, in terms of money 
values, a fixed range had been used, the group having a per capita expenditure between 
Rs. 20 and Rs. 22 per 30 days, would have been between 70 and 75 per cent of the 
households in the 7th round, but would have been almost between 75 and 80 per cent 
in the 9th round. In the present approach the comparison between the two rounds 
is made on the basis of the same fractile group, that is between 70 and 75 per cent 
or between 75 and 80 per cent of households in each of the rounds. 


2.3. Consider also an associated variate y, the per capita expenditure on 
foodgrains for each sample household. Тһе average value of y, that is, the average 
per capita expenditure on foodgrains for each fractile group, is given in columns 
(2), (3), and (4) of Table III for the first sub-sample (s.s. 1), the second sub-sample 
(s.s. 2) and the combined sample for the 7th round respectively. Corresponding 
values for the 9th round are given in columns (2), (3), and (4) of Table IV. The 
results are shown in Figure 1, where the a-axis represents. percentages of house- 

- holds, as ranked by per capita consumption expenditure. The limiting values of 
the per capita expenditure at the upper end of the each fractile group (in Table I, 
for the 7th round) are shown at the top of the 2-scale ; and the corresponding 


11 Averages for larger fractile groups oan be obtained directly by taking the average of an appro- 
priate number of equi-frequency fractile groups. 

12 Based on the sample survey, the estimated number of rural households was 63.4 million in 
the 7th round and 65.3 million in the 9th round; the estimated rural population was 324 million and 
338 million in the 7th and 9th rounds, respectively; and the estimated per capita consumption expendi- 
ture in rural households was ‘Rs. 5565 million and Rs. 5131 million per 30 days аб current prices in the 
7th and the 9th rounds, respectively. The total number of mauzas (villages) in India, excluding Jammu 
and Kashmir, was 603,168 in the 1951 census. T s 
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TABLE І. NATIONAL SAMPLE SURVEY OF INDIA: ALL INDIA, RURAL 
Tra ROUND: OCTOBER, 1954—MARCH, 1955* < 


TOTAL PER CAPITA CONSUMER EXPENDITURE IN RUPEES PER 30 DAYS н 


total рег capita consumer expenditure in rupees рег 30 days 


serial fractile limiting value at upper end of each average expenditure in each 
num- group fractile group fractile group 
ber (pereent) ------:-:-:Б:ББ.Б------ 
3.8.1 3.5.2 combined 8.8.1 8.8.2 combined 
(1) (2) (3) (4) (5) (6) (7) 
1 1 2.60 3.25 3.00 2.44 2.20 2.23 
2 2 3.00 4.13 3.38 2.91 3.74 3.14 
3 3 4.00 4.75 4.57 3.35 4.54 4.10 
4 4 5.00 4.83 4.83 4.63 4.78 4,73 
Nd 5 5.57 5.25 5.50 5.32 4.99 5.13 
6 0—5 5.57 5.25 5.50 3.75 3.99 3.85 
7 5—10 7.95 6.88 7.1 6.63 6.24 6.39 
8 10—15 8.25 7.90 8.00 1.12 7.46 7.54 
9 15--20 9.20 9.00 9.25 8.77 8.53 8.64 
10 20—25. 9.80 9.67 9.75 9.58 9.43 9.51 
11 25—30 10.60 10.25 10.50 10.22 9.93 10.03 
12 30--35 11.83 11.00 11.40 11.94 10.69 10.95 
13 35—40 13.00 12.00 12.43 i 12.44 11.45 11.94 
l4 40—45 14.00 13.00 13.50 13.37 12.48 12.83 
- 15 45—50 - 14.75 14.25 14.40 2 14,23 13.64 14.00 
16 50—55 16.00 15.25 15.67 15.40 14.73 15.02 
17 50--55 17.60 16.60 17.00 16.82 15.99 16.37 
18 60—65 18.80 18.00 18.45 18.34 17.92 17.78 
19 65—70 21.00 19.25 20.00 19.90 18.67 19.18 
20. 70—75 22.00 22.00 22.00 21.39 20.49 21.11 
2281 75—80 25.00 24.00 24.33 23.34 23.02 23.12 
22 80—85 27.67 28.00 28.00 26.63 26.09 26.40 
‚ 23 85—90 32.75 32.33 32.67 30.07 29.74 29.95 
24 90—95 43,67 -39.00 41.33 © 37.17 35.81 36.14 
25 95—100 226.00 264.50 264.50 74.11 58.97 64.98 
26 96 47.00 43.00 45.33 45.76 41.97 43.86 
z 97 - 59.00 48.00 51.00 51.91 45.51 - 47,34 
28 98 79.17 53.50 61.00 66.35 51.37 54.08 
29 99 76.00 65.25 74.33 15.46 59.64 67.05 
30 — i 100 226.00 264.50 264.50 118.11 142.48 122.59 
31 0—20 6.74 6.49 6.64 
; 52 2040 10.88 10.49 10.66 
Ba ro 14.92 13.97 14.44 
с 10 20.52 19.91 20.20 
BERS 81100-22 Ы 39.77 36.45 38.03 
i que T 7.30 1.11 1.18 
= 38 Eo р 12.41 11.73 12.03 
39 Ер > E 18.20 17.45 17.83 
К є Г қ 36.45 33.44 34.86 
«1.40 =. .0—50 AES i 
x 9.88 9.58 9.73 
4 > 
1 50—100 26.57 25.20 25.87 
42 0—100 17.65 16.28 17.24 


sub-sample 1 sub-sample2 combined 


* number of sample villages 476 2 478 954 
number of sample househods 702 m 1413 
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TABLE II. NATIONAL SAMPLE SURVEY OF INDIA : ALL INDIA, RURAL 
Этн ROUND : MAY—NOVEMBER, 1955* 
TOTAL PER CAPITA CONSUMER EXPENDITURE IN RUPEES PER 30 DAYS 
———————————-———-— 
total per capita consumer expenditure in rupees per 30 days 


serial fractile limiting value at upper end of each average expenditure in each 
num- group fractile group fractile group 
ber (percent) 
8.8.1 8.8.2 combined s.s. 1 в.в.2 combined 
(1) (2) (3) (4) (5) (6) (7) 
1 1 3.19 2.98 3.16 2,99 2.58 2.71 
2 2 3.91 3.71 3.87 3.67 3.43 3.52 
3 3 4,28 4.16 4,20 4.13 4.01 4.06 
4 4 4.65 4.56 4.65 4.51 4.39 4.46 
5 5 4.91 4.79 4.90 4.85 4.72 4.78 
6 0—5 4.91 4.19 4.90 4.01 3.82 3.90 
J 5—10 5.99 5.85 5.92 5.43 5.39 5.39 
8 10—15 6.74 6.88 6.80 6.39 6.38 6.38 
9 15—20 7.54 7.90 7.76 7.25 7.35 7.28 
10 20--25 8.52 8.59 8.56 8.05 8.81 8.20 
11 25—30 9.36 9.57 9.53 8.96 9.19 9.06 
12 30—35 10.17 10.39 10.31 9.80 9.95 9.89 
13 35—40 11.03 11.09 11.07 10.66 10.74 10.69 
14 40—45 11.80 12.06 11.92 11.42 11.68 11.49 
15 45—50 12.83 18.02 12.88 12.35 12.59 12.43 
16 50—55 14.14 14.05 14.14 13.46 13.45 13.44 
17 55—60 15.13 15.55 15.24 14.57 14.73 14.64 
18 60—65 16.61 16.97 16.77 15.81 16.12 16.00 
19 65—70 18.49 18.59 18.56 17.61 17.88 17.78 
20 70—75 19.85 20.08 20.02 19.14 19.37 19.26 
21 75—80 92.59 22.01 22.11 21.22 21.21 21.19 
22 80—85 24.95 28.60 24,35 23.77 22.97 28.86 
93 85--90 29.57 27.53 28.59 27.40 25.42 26.93 
24 90--95 38.05 38.88 38.13 32.85 32.06 32.17 
25 95—100 194.41 128.86 194.41 55.96 53.30 54.31 
26 96 43.45 40.74 40.77 40.34 39.83 39.35 
27 97 47.41 44.16 46.30 45.60 42.94 44.01 
28 98 58.18 51.16 51.16 56.06 49.80 52.11 
29 99 66.86 67.69 66.86 65.34 72.22 62.86 
30 100 194.41 128.86 194.41 96.54 66.65 78.88 
31 0—20 5.64 5.72 5.67 
82 20—40 9.39 9.56 9.46 
33 40—60 13.01 13.15 19.06 
34 60—80 18.33 18.82 18.58 
35 80—100 32.90 33.15 32.99 
86 0—25) 6.06 6.26 6.16 
37 25-501 10.61" 10.77 10.68 
38 50—75 16.04 16.27 16.12 
39 75—100 30.53 30.45 30.43 
40 0—50 8.81 8.48 8.89 
41 50—100 22.65 28.10 22.85 
42 0-100 15.03 15.24 > 15.15 


* The number of sample villages 


was 772 (including 4 uninhabited mauzas) and the number of sample 
households was 768 in each sub-sample. 
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TABLE III. NATIONAL SAMPLE SURVEY OF INDIA: ALL INDIA, RURAL 
Ттн ROUND: OCTOBER, 1953—MARCH, 1954* 


AVERAGE PER CAPITA EXPENDITURE ON FOODGRAINS IN RUPEES AND AS 
. FRACTION OF TOTAL EXPENDITURE PER 30 DAYS 
nn 


x average per capita expendi- average fraetion of total 
serial асе ture on foodgrains — . expenditure 
num- group 
ber (percent) 8.8.1 8.8.2 combined 8.8.1 8.8.2 combined 
(1) (2) (3) (4) (5) (6) (7) 
Алақ Тана аст а o E 
1 1 1.09 2.83 2.16 0.436 7 0.588 0.572 
9 2 2.32 2.15 2.13 0.751 0.704 0.679 
3 3 2.26 2.57 2.74 0.656 0.550 0.656 
4 4 3.12 2.66 ЖЛ 0.639 0.624 0.576 
_ 5 5 3.70 3.92 3.39 0.632 0.622 0.610 
6 0-5 2.49 2.79 2.61 0.635 0.614 0.020 
ui 5--10 4.07 4.20 4.20 0.600 0.664 0.648 
8 10—15 5.07 4.4 4.70 0.616 0,554 0.573 
9 15—20 4.09 5.28 5.11 0.562 0.584 0.576 
10 20—95 5.71 4.66 - 5.17 0.572 0.535 0.550 
< 
11 25—30 5.56 5.39 5.53 0.560 0.560 0.562 
12 30—35 5.59 5.57 5.98 0.515 0.557 0.541 
13 35—40. 6.00 6.73 6.28 0.409 0.552 0.534 
14 40—45 6.75 7.29 6.82 0.487 0.546 0.492 
15 45—50 8.15 6.60 7.46 0.510 0.490 0.503 
16 50—55 7.38 6.83 1.21 0.481 0.475 0.491 
17 55-60 7.67 7.84 7.78 0.492 0.516 0.489 
18 60—65 7.31 8.49 7.85 0.398 0.464 * 0.444 
19 65-70 8.95 7.65 8.93 0.446 0.417 0.425 
20 10—15 8.29 9.27 8.74 0.423 0.456 0.440 
А 21 75—80 - 9.44 8.93 9.24 0.388 0.396 0.398 
99 80—85 8.69 8.69 8.59 0.358 0.382 0.366 
23 85—90 9.30 8.53 8.96 0.324 0.333 0.331 
^ 24 90—95 10.13 8.18 9.29 0.292 0.238 0.272 

a 25 95—100 12.02 12.19 11.78 0.214 0.294 0.219 

e 28 17 96 13.18... 13.09 10.38 0.303 0.322 9.261 

ҮКСЕЗ? 97 13.72 7 12.76 ТОЯ 0.253 0.277 0.275 

E 28 98 13.29 12.04 11.51 0.233 0.232 0.222 

2 29. 99 10.68 10.06 11.74 0.139 0.189 0.185 

30 100 9.27 13.21 - 10.38 0.081 0.161 0.102 
31 0—20 4.19 4.13 4.17 0.601 0.603 | 0.602 
32 20—40 5.72 5.70 5.77 0.534 0.552 0.547 
33 40—60. 7.54 7.16 7.30 0.492 0.506 0.494 
34 60—80 8.40 8.50 8.49 0.412 0.432 0.426 
35 80—100 9.87 9.25 9.54 0.296 0.295 0.296 
36.7 0—25 49 4.24 4.36 0.615 1 
- Е x : 0.597 0.607 
z A 6.51 6.47 6.46 0.524 0.551 0.536 
5 d 7.856 8.02 7.94 0.432 0.460 0.446 
00 9.79. 9.18 9.48 0.268 0.274 0.272 

40 -- 0-50: 5.51. 5.43 5.46 0 ; 

Би | ( : 2 3 .558 0.567 0.561 
4 Me доо, 8.74 8.58 8.67 0.329 0.341 0.335 
ж- be ` - + 

Ms - 42 20-100 7.01 6.90 6.96 0.398 0.410 0.404 


ы; Е è Be nae sub-sample 1 ` sub-sample 2 combined 
" * number of sample villages 476 —— 418 954 
number of sample households 702 711 1413 
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TABLE IV. NATIONAL SAMPLE SURVEY OF INDIA: ALL INDIA, RURAL 
9тн ROUND: MAY—NOVEMBER, 1955* 


AVERAGE PER CAPITA EXPENDITURE ON FOODGRAINS IN RUPEES AND AS 
FRACTION OF TOTAL EXPENDITURE PER 30 DAYS 
OOOO 

average fraction of total 


average per capita expendi- 


serial fractile ture on foodgrains expenditure 
num- group 
ber | (percent) 8.8.1 8.8.2 combined 8.8.1 8.3.2 combined. 
(1) (2) (3) (4) (5) (6) (7) 
b 1 1.72 1.65 1.66. 0.581 0.654 0.638 
2 2 1.66 1.86 1.87 0.463 0.567 0.532 
3 3 2.46 2.36 2.34 0.597 0.604 0.594 
4 4 2.48 3.19 2.89 0.556 0.675 0.602 
5 5 3.52 3.17 3.25 0.686 0.672 0.657 
6 0—5 2.34 2.49 2.39 :0.571 0.684 0.603 
7 5—10 3.01 3.14 3.09 0.557 0.581 0.568 
8 10—15 3.43 3.54 3.48 0:550 0.550 0.554 
9 15—20 3.42 3.88 3.60. 0.511 0.541 0.524 
10 20—25 4.46 4.37 4.50 0,554 0.516 0.539 
11 25—30 4.28 4.50 4.39 0.509 0.502 0.500 
12 30—35 5.11 5.06 5.05 0.525 0.485 0.509 
18 35—40 5.44 5.43 5.34 0.522 0,528 0.517 
14 40—45 5.18 6.02 5.61 0.469 0.500 0.493 
15 45--50 6.25 5.70 6.00 0.495 0.484 0.488 
16 50—55 5.45 6.09 5.80 0,418 0.456 0.434 
17 * 55—60 7.05 5,99 6.56 0.477 0.417 0.456 
18 60—65 6.29 6.09 6.30 0.428 0.428 0.427 
19 65--70 5.68 7.03 6.18 0.348 0.391 0.355 
20 10—15 7.15 7.28 7.19 0.389 0.374 0.383 
21 15—80 7.41 7.61 7.50 0,354 0.360 0.356 
22 80—85 . 7.82 8.49 8.05 0.332 0.378 0.354 
23 85—90 8.36 8.09 8.21 0.299 0.330 70,316 
24 90—95 9.66 7:21 8.50 0.289 0.290 0.288 
25 95—100 10.51 8.97 9.50 0.189 0.193 0.190 
26 96 7.96 9.81 9.01 0.225 0.254 0.235 
21 97 11.06 6.73 8.68 0.246 ` 0.162 0.222 
28 98 9.64 8.79 7.31 0.173 0.217 0.165 
29 99 12.01 10.75 11.65 0.165 0.201 0.183 
30 100 13.98 9.86 * 11.52 0.142 0.114 0.140 
31 0—20 3.00 3.26 3:11 0.548 0.576 0.562 
32 20—40 4.81 4.85 4,81 0.528 0.506 0.516 
38 40—60 6.03 5.95 6.01 0.465 0.468 0.467 
294 60--80 6.56 7.04 6.79 0.379 0.385 0.880 
35 80--100 8.99 8.12 8.52 0.277 0,294 0.286 
86 0—25 3.25 3.49 3.38 0.537 0.557 0.549 
37 25—50 5.24 5.32 5.26 0.494 0.494 0.493 
38 50—75 6.30 6.49 6.39 0.393 0.399 0.396 
39 75—100 8.61 8.00 8.30 0.282 0.263 0.273 
40 0—50 4.94 4.39 4,31 0.510 0.517 0.514 
41 50—100 1.85 1.22 1.29 0.325 0.312 0.319 
42 0-100 5.70 5.71 5.70 0.379 0.373 0.376. 


sample 


*The number of sample villages was 772 (including 4 uninhabited mauzas) and the number of 


households was 768 in each sub-sample. 
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A METHOD OF FRACTILE GRAPHICAL ANALYSIS 


limiting values (in Table II for the 9th round) are shown at the bottom of the «-scale. 
The y-axis represents the average value of the per capita expenditure оп foodgrains 
in rupees per 30 days for each 5 per cent fractile group. 


2.4. Now consider the three upper graphs in Figure 1 which represent G(1), 
G(2) and G(1, 2) for the 7th round. Тһе shaded area between 6(1) and G(2) supplies 
a measure of the error associated with G(1, 2), which has been called e. It will be seen 
from all three graphs, that the per capita expenditure on foodgrains increases over 
the whole range from the bottom 5 per cent to the top 5. рег cent of households, and 
that the general trend of the increase in expenditure is significant in comparison with 
the associated error. The three lower graphs in Figure 1 in the same way represent . 
the three graphs G'(1), 62), G'(1, 2) for the 9th round and the shaded area between 
G'(1) and G'(2) gives the error associated with G'(1, 2) ore’. In this case also the general 
trend of the increase in expenditure on foodgrains is significant in comparison with 
the associated error. T 


2.5. Now consider the “separation” which is given by the area lying between 
the two combined graphs G(1, 2) and G'(1,2) for the two rounds of the survey. The 
per capita expenditure on foodgrains was lower for every 5 per cent fractile group in 
the 9th round. From the bottom up to 80 per cent of the households, the separation 
is seen to be roughly greater than the sum of the two associated error areas,!4 e and 
6. The 9th round decrease in the expenditure on foodgrains for up to 80 per cent 
for households may, therefore, be considered statistically significant. The per capita 
expenditure on foodgrains was also less in the 9th round for the top 20 per cent of house- 
holds, but the separation was here smaller than the two associated error areas. Tt 
is not possible, therefore, to assert that the decrease was significant for each of the 
four 5 per cent fractile groups at the top. This point can be further examined by 
pooling together the four 5 per cent fractile groups at the top to form a single 20 per 
cent group. For this group the average values for the 7th round from Table ІП 
are Rs. 9.87 and Rs. 9.25 for the two sub-samples and Rs. 9.54 for the combined 
sample. For the 9th round from Table IV the average values are Rs. 8.92 and Rs. 8.12 
for the two sub-samples ‘and Rs. 8.52 for the combined sample. The difference between 
the two rounds for the estimates based on the combined samples is Rs. 1.02, which 
is only somewhat greater than the associated error. Even for the top 20 per cent 
group as a whole the decrease in the average per capita expenditure on foodgrains was 
not quite significant: 

2.6. Consider next the per capita expenditure on foodgrains as a fraction 
of the total per capita consumer expenditure of which the average values for each 
fractile group are given in columns (5), (6), and (7) of Table ТП and Table IV for the 


13 The two error areas, e and-e/; are seen to be roughly equal, and E, the error of the separation, 
is therefore very roughly 1.4 е. { > 

14 Аз the sum of the component areas (е--е”) is greater than ,/(e --е?2) there is a margin, of safety 
in using the sum of the errors to test the significance of the separation. The advantage is that a visual 
comparison is possible. It is also possible to measure the separation, S, as well as the two associated 
errors e and e’; calculate И, the error of the separation; obtain S/E and plot these values for each fractile 
part. This would involve a certain amount of additional calculations, but would supply necessary material 
in a convenient graphical form to test the significance of the separation, 
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7th round and the 9th round, respectively. The corresponding graphs are shown 
in Figure 2. It will be noticed that the values of the fraction is generally lower in 
+ the 9th round but the separation is now much less in comparison with the associated 
error areas. That is, the fraction of total consumer expenditure spent on foodgrains 
‘drops to some extent with a fall in prices but, on the whole, tends to remain much 
more stable than the actual expenditure on foodgrains. For the rural households 
as a whole, foodgrains account for from 38 to 40 per cent of all consumer expenditure 
and the fraction is over 60 per cent in the bottom 5 per cent of households. 


2.7. Тһе same data can be shown in the form of cumulative percentages 
which are given for the expenditure on all consumer items in columns (2), (3), and (4) 
and for the expenditure оп foodgrains in columns (5), (6) and (7) in Tables V and УІ 
for the 7th round and 9th round, respectively. Figure 3 shows the graphs for the 
7th round, and Figure 4 for the 9th round. Тһе three upper graphs in Figure 3 
and Figure 4 represent the concentration of expenditure on foodgrains, and the area 
lying between the two sub-sample graphs G(1) and G(2) gives the error associated with 
the graph G(l, 2) based on the combined sample. The three lower graphs represent 
the concentration of total consumer expenditure. In both cases, the separation between 
G(1, 2) and G'(1,2) or between the two sets of three graphs is much greater than either 
of the two associated error areas. The concentration graph for expenditure on food- 
grains thus lies significantly nearer the line of equal distribution, y = x, showing 
the inelastic nature of the expenditure on foodgrains. The area lying between the line 
у = = and the graph G(1, 2) ean supply a convenient measure of concentration; this 
also will have the same associated error as G(l, 2). 


2.8. Тһе concentration curves (based on the combined sample in each саяс) 
for both the 7th and 9th rounds are shown in Figure 5. "The two lower graphs, which 
represent the two concentration eurves for total consumer expenditure, cross and 
recross, indicating that there was no change іп this respect between the 7th and the 
9th round. Тһе two upper graphs represent the two concentration curves for the 
expenditure on foodgrains. These two graphs also cross and recross up to the 50th 
percentile of households and again beyond the 90th percentile of households, the 
graph for the 9th round (when prices were lower) is systematically below the graph for the 

_Tthround. It is possible to examine these portions of the two graphs in greater detail. 
Figure 6 shows the concentration curves between the 50th and the 90th percentiles of 
households on a magnified scale for both rounds (with separate graphs for the two 
sub-samples and the combined sample in each case). "Тһе shaded area shows the error 
associated with the respective graphs. - The “separation” of the two graphs G(1,2) 
and.G'(1, 2) between the two rounds is on the whole somewhat greater than the two 
associated error areas. This, if real, would indicate that in this middle region of 
households between the 50th and 90th percentiles (with total per capita consumer 
expenditure lying roughly between Rs. 13 or 14 and Rs. 30 or Rs. 32 per month) 
ES. expenditure on foodgrains tends to behave somewhat more as a necessity when 
prices are higher. Тһе observed difference is based on very small samples and may 
have arisen through chance; but it may deserve more careful study with larger samples. 
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TABLE V. NATIONAL SAMPLE. SURVEY OF INDIA: ALL INDIA, RURAL 
Ттн ROUND: OCTOBER, 1952-MARCH, 1954* 


CUMULATIVE PERCENTAGE OF TOTAL CONSUMER EXPENDITURE AND EXPENDITURE 
ON FOODGRAINS PER 30 DAYS 


Е ЫЗА "ыызы озу eT 


^ cumulative percentage cumulative percentage of 
serial fractile of expenditure , expenditure on foodgrains 
num- group 
ber (percent) 8.8.1 8.8,2 combined в.в.1 8.8.2 combined. 
(1) (2) (8) Ке (5) (6) (7) 
m 1 0.14 0.14 0.13 0.15 0.45 0.31 
2 2 0,26 0.36 0.31 0.40 0.85 0.61 
3 3 0.50 0.67 0.55 0.79 1.28 1.01 
4 4 SL. 0.93 0.86 1.31 1.62 1.45 
5 0-5 1.07 1.18 1.11 1.79 2.01 1.86 
6 5—10 3.09 3.13 3.00 4.90 5.20 4.93 
7 10—15 5.88 5.59 5.54 9.50 8.76 8.85 
8 15--20 8.22 1.64 8.01 12.84 11.87 12.41 
9 20—25 11.05 10,62 10.71 17.07 15.45 16,10 
10 25—30 13.91 13.07 13.35 20.99 18.71 19.70 
11 30—35 17.36 16.22 17.44 25,31 22.70 25.23 
12 35—40 21.02 21.19 21.05 29.75 29.83 29.93 
18 40—45 24.88 26.88 25.68 34.65 87.92 36.02 
14 45—50 30.26 30.71 30.47 42.38 42.46 42,84 
15 50—55 34.76 34.98 34.69 47.81 47.21 47.35 
16 55-—60 39.83 39.31 39.58 58.61 52.48 59.05 
17 60--65 45.91 44.11 44.64 59.71 58.22 58.63 
18 65—70 51.22 49.45 50.50 65.69 63.58 64.86 
19 10—15 56.11 55.46 55.15 10.41 10.19 70.23 
20 15—80 61.79 62.88 61.98 16.18 76,71 76.39 
21 80—85 69.75 70.00 69.99 82.71 82.92 82.84 
22 85—90 16.52 77.31 76.94 87.96 88.04 87.99 
23 90—95 85.45 87.26 85.84 94.08 93.58 93.65 
24 96 . 87.04 88.19 88.19 95.21 94.29 95.03 
25 97 89,37 92.14 90.29 96.74 96.99 96.67 
26 98 92.25 94.44 92.64 98.18 98.31 97.91 
27 99 93.87 96.83 95.08 98.75 99.28 98.97 
28 100 100.00 100.00 100.00 100.00 100.00 100.00 
sub-sample 1 sub-sample 2 combined 
* number of sample villages 476 478 954 
number of sample households 702 711 1413 


53 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES А 


TABLE VI. NATIONAL SAMPLE SURVEY OF INDIA’: ALL INDIA, RURAL 
Этн ROUND : MAY, 1955—NOVEMBER, 1955* 

CUMULATIVE PERCENTAGE OF TOTAL CONSUMER EXPENDITURE AND EXPENDITURE 
ON FOODGRAINS PER 30 DAYS 
АЕ 

cumulative percentage , cumulative percentage of 


— serial fractile of expenditure expenditure on foodgrains 
Tae С) 8.8.1 в.в.2 combined 8.8.1 8.8.2 Combined - 
(1) (2) (3) (4) (5) (6) (7) SN 
l 1 0.25 0.32 0.19 0.37 0,38 0.32 
2 2 0.65 0.38 0.49 0.86 . 0.02 0.74 
3 3 0.88 0.68 0.80 0.22 1.08 1.21 
4 4 1.25 1.11 1.17 1.75 1.91 1.85 
5 0-5 1.69 1.39 1.50 2.59 2.43 2.44 
6 5—10 3.88 3.25 3.54 5.80 5.33 5.54 
7 10—15 5.91 5.37 5.62 8.08 8.48 8.55 
8 15—20 8.32 8.03 8.13 11.67 12.24 11.85 
9 20—25 10.81 11.06 10.97 ^" 15.81 16.51 16.00 
10 25—30 14.36 14.33 14.34 19.79 20.80 20.34 
11 80—35 17.39 ^ 17.95 TII 23.96 25.73 24.92 
12 35—40 21.29 21.95 721.48 29.22 31.15 29.02 
13 40—45 24.80 "25.58 34,9757. 33.42 36.16 34.45 
14 45—50 29.36 29.02 . 29.38 39.51 41.07 40.12 
15 50--55 33.90 34.23 34.03 44.36 46.66 45.45 
16 55—60 39.30 39.14 39.30 51.26 52.01 61.73 
17 60--65 44.29 43.48 44.04 56.50 56.40 56.69 
18 65—10 51.02 48.75 49.97 62.22 61.96 62.17 
19: 1710-15 50.55 55.30 - 55.76 67.07 68.51 67.92 
20 15—380 62.69 62.40 62.45 73.32 75.32 74.22 
21 80—85 10.40 67.71 69.12 80.01 80.58 80.33 
22 85—90 79.09 15.85 76.74 87.00 87.52 86.66 
23 90--95 88.53 86.29 87.31 94.32 93.82 94.09 
24 96 90.55 87.79 88.87 95.37 94.81 95.04 
к 
95 97 92.39 90.18 91.4 96.55 95.82 96.39 
26 98 94.71 93.98 94.06 97.60 97.60 97.37 
aT 99 98.44 à 96.34 97.16 99.41 98.71 98.90 
28 _ 100 100.00 100.00 100.00 100.00 100.00 100.00 


The number of sample villages was 772 (including 4 uninhabited mauzas) and the number of 
sample households was 768 in each sub-sample. 
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А METHOD ОЕ FRACTILE GRAPHICAL ANALYSIS: 


3. REGIONAL DIFFERENCES 


3.1. Consider a second. example. In the 8th round (July, 1954—April, 
1955) of the National Sample Survey information was collected for the whole of rural 
India (including Jammu and Kashmir) on holdings of land by households. Data 
for two States, namely, West Bengal and Andhra, may be used to illustrate the graphical 
method for comparisons between different geographical regions in the same period of 
time. There were twelve interpenetrating samples. Out of this material, consider 
only two sub-samples of 18 villages each from only two states, West Bengal and 
Andhra, with a total number of villages of 38,590 and 18,912, respectively. The 
number of sample households in West Bengal was 569 in the first sub-sample and 
406 in the second sub-sample with a combined sample of 975 households from an 
estimated number of 5,413,000; the sampling fraction was about one in 5,600. In 
Andhra, the number of sample households was 343 and 360 in the two sub-samples, 
respectively, giving а combined sample of 703 out of an estimated number of 4,066,000 
households; the over-all sampling fraction in this case was about one іп 5,800. The 
variate selected for the present example is the land owned lay each household in the 
rural area. There are, of course, many households owning no land, for example, house- 
holds of landless labourers, artisans, professional people, ete., but they are included 
in the present study. The sample households were ranked by the size of their land- 
holdings; and using appropriate probability weights, the number of households.in the 
population and the area owned by each were estimated in the usual way. The estimated 
number of households was then divided into fractile. groups and the land owned in 
each fractile group was also estimated. From this it is possible to calculate the 
cumulative percentage of owned area and also the average size of household ownership 
landholding in each fractile group. The data for West Bengal are shown in Table 
"VIL in which columns (2), (3), and (4) give the limiting value at the upper end of each 
fractile group of the size of ownership holdings in acres for the first sub-sample, the 
second sub-sample, and the combined sample, respectively. The next three columns 
--(5); (6), and (7)—give the cumulative percentage of land owned below and inclusive 
of each fractile group; and columns (8), (9), and (10) give the average size of a house- 
hold ownership holding in each fractile group. Similar data are given for Andhra 
State in the corresponding columns of Table VIII. Р 

3.2. Consider Figure 7.1 in which the a-axis represents the percentages 
of households (on the basis of ranking by amount of land owned) and the y-axis 
represents the average size of the land owned by households for each fractile group. 
The values for each of the five 20 per cent fractile groups have been plotted for both 
the sub-samples and the combined sample for both West Bengal and Andhra. ` The 
shaded area in each case gives the associated error. It can easily be seen that up 
to the 80th percentile there is overlapping of the error areas, showing that, on the basis 
of the available samples, the average size of holdings in the two States cannot -be 
considered different. ‘The average values for the top 20 per cent group are very 
clearly separated; and the average size of owned land for the top 20 per cent of house- 
holds, as a whole, can be considered to be definitely higher in Andhra, 
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TABLE ҮП. NATIONAL SAMPLE SURVEY OF INDIA : RURAL 


8тн ROUND: JULY, 1954-APRIL, 1955* 


HOUSEHOLD OWNERSHIP HOLDINGS 
State : West Bengal, Central Sample 


upper limit of size of 
fractile ownership holding (acre) 


cumulative percentage 
of owned area below 


average size of household 
ownership holding (acres) 


Sos ssl 8.8.2 combined в.в.1 в.8.2 combined в.в.1 $92 combine 
(1) (2) (3) (4) (5) (6) (7) (8) (9) 510) 

1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 

0--15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
15—20 0.00 0.04 0.00 0.00 0.04 0.00 0.00 .02 0.00 
20—25 0.00 0.06 0.00 0.00 0.13 0.00 0,00 0.05 0.00 
25—30 9.00 0.17 0.04 0.00 0.30 0.06 0.00 һ 0.09 0.03 
30—35 0.04 0.37 0.08 0.03 0.74 0.19 0.01 0.25 0.06 
35—40 0.08 0.60 0.17 0.17 1.62 50.46 0.06 0.51 0.12 
40—45 0.15 0.89 0.36 0.43 2.95 1.03 0.11 9.76 0.27 
45--50 0.32 1,19 0.66 0.94 4.73 2.13 0.21 1.02 0.51 
50--55 0.52 1.52 1.00 1.91 7.04 3.94 0.39 1.32 0.84 
55--60 0.95 2.00 1.36 3.70 10.12 6.47 0.72 1.76 1.18 
60—65 1.33 2.40 1.86 6.49 13.92 9.88 1.13 2.18 1.59 
65--70 1.84 2,85 2.30 10.32 18.49 14.33 1.55 2.61 2.07 
10—75 2.20 3.33 2.92 15.37 23.85 19.94 2.04 3.06 2.60 
76—80 3.22 4.19 3.78 22.05 30.26 26.19 9.71 3.67 3.32 
80—85 4.31 5.79 4.79 31.23 38.05 36.30 3.71 4.79 4.28 
85--90 5.99 7.73 7.08 43.46 50.31 49.09 4.94 6.66 5.94 
90—95 8.60 12.46 10.70 60.87 67.41 66.88 7.06 9.81 8.29 
95—100 84,66 49.28 84.66 100.00 100.00 100.00 15,83 18.60 18.72 
96 9.19 13.53 12.75 65.30 71.97 70.92 8.88 12.96 11.82 

97 10.27 15.15 14.64 70.04 76.98 75.73 9.66 14.22 13.74 

98 14.91 16.64 16.64 76.29 82.54 81.20 12.71 15.80 15.64 

99 20.05 21.90 21.82 84.73 88.94 87.99 16.90 18.63 18.76 

100 84.66 49.28 84.66 100.00 100.00 100.00 31.07 31.41 32.84 
0--20 0.00 0.04 0 09.00 0.04 0.00 0.00 0.01 0.00 
20—40 0.08 0.60 0.17 0.17 1.62 0.46 0.02 0.23 0.05 
2-40-60 0.95 2.00 1.36 3.70 10.12 6.47 0.36 1.21 0.70 
60—80 3.22 4.19 3.78 22.05 30.26 26.19 1.86 2.88 2.39 
80--100 84.66 49.28 84.66 100.00 100.00 100.00 7.89 9.97 8.47 
0—50 0.32 1.19 0.66 0.94 4.73 2:18 0.04 0.27 0.10 
50—100 84.66 49.28 84.66 100.00 100.00 100.00 4.01 5.45 4.55 
0--100 84.66 49.28 84.66 100.00 100.00 100.00 2.02 2.86 2.32 


* number of sample villages 
number of sample households 


sub-sample 1 sub-sample 2 combined 


18 
569 


18 
406 


36 
975 
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TABLE УШ. NATIONAL SAMPLE SURVEY OF INDIA : RURAL 
ӛтн ROUND : JULY, 1954—APRIL, 1955* 
HOUSEHOLD OWNERSHIP HOLDINGS 

State: Andhra, Central Sample 


See a en 


upper limit of size of 2 cumulative percentage average size of household 
fractile ownership holding (acre) of owned area below ownership holding (acres) 
group 
8.5.1 8.8.2 combined 8.8.1 s.$2 combined 8.3.1 8.8.2 combined 
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) 
1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
10 0.00 0.00 0.00 0.00 ' 0.00 0.00 0.00 0.00 0.00 
0—15 0.00 0.01 0.00 ° 0.00, . 0.01 0.00 0.00 0,01 0.00 
15—20 0.01 0.01 0.01 0.00 ‚0.02 0.01 0.00 0.01 0.01 
20—25 0.01 0.01 0.01 0.01 ‚0.03 0.02 0.01 0.01 0.01 
25—30 0.02 0.02 0.02 0.03 0,05 0.04 0.02 0.02 0.02 
30—35 0.04 0.05 0.04 0.07 0.09 0.08 0.03 0.03 0.03 
35—40 0.26 0.09 0.10 0.15 0.15 0.15 0.07 0.05 0.06 
40—45 0.52 0.42 0.51 0.73 0.41 0.59 0.45 0.23 0.35 
45—50 0.95 0.61 0.78 1.57 1.00 1.31 0.65 0.51 0.59 
50—55 1.52 1.01 1.02 2.95 2.02 2.48 1.07 0.89 0.96 
55--60 2.10 1.40 1.68 5.27 3.29 4.27 1,81 1.11 1.37 
60—65 3.02 1,80 2.32 8.49 5.12 6.55 2.50 1.58 1.95 
65--70 3.87 2.50 3.02 12.74 7.53 9.85 3.31 2,0 2.70 
10—75 4.74 3.01 4.02 18.21 10.76 14.06 4.26 2.80 3.42 
75—80 5.77 4.61 5.39 24.95 15:02 19.86 5.25 3.69 4.74 
80—85 8.09 6.34 7.05 33.66 21.50 27.33 6.78 5.64 6.11 
85—90 10.72 9.41 9.86 * 46.39 30.54 37.51 9.14 7.83 8.92 
90--95 15.84 15.42 15.34 61.35 43.96 52.36 12.42 11.69 12.07 
95--100 324.52 245.52 345.52 100.00 100.00 100.00 30.10 48.55 38.93 
96 17.08 18.69 17.31 65.02 47.87 56.30 16.46 16.62 16.34 
97 19.03 25.82 21.04 70.45 52.90 61.00 18.16 21.88 19.04 
98 22.98 35.64 28,18 75.65 59.94 66,98 20.96 30.65 24.20 
99 31.56 50.17 42.07 82,52 70.13 75.15 27.13 44.39 33.87 
100 324.52 345.52 345.52 100.00 . 100.00 100.00 67.42 130.03 100.54 
0--20 0.01 0.01 0.01 0.00 0.02 0.01 0.00 0.00 0.00 
20—40 0.26 0.09 0.10 0.15 0.15 0.15 0.03 0.03 0.03 
40--60 2.10 1.40 1.08 е 5.27 3.29 4.27 0.99 0.68 0.82 
60—80 5.77 4.61 5.39 24.95 15.02 19.83 3.83 2.54 3.20 
80--100 324,52 345.52 345.52 100.00 100.00 100.00 14.61 18.45 16,35 
0--50 0.95 0.61 0.78 1.57 1.00 1.81 0,12 0.09 0.11 
50—100 324.52 345.52 345.52 100.00 100.00 100.00 7.66 8.60 8.05 
0—100 324.52 345.52 345.52 100.00 100.00 100.00 3.89 4.34 4.08 


sub-sample 1 sub-sample 2 combined 
* number of sample villages 18 18 36 
number of sample households 343 360 703 
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3.3. It is possible to make a more detailed examination of the range above 
65 per cent of households. The average size of land owned in each 5 per cent fractile 
group between the 65th and 100th percentiles is plotted in Figure 7.2 for both West 
Bengal and Andhra; and the associated error areas are also shown in the usual way. 
In comparison with the associated error, the separation is significant for the group 
falling between the 90th and 95th percentiles, and is on the verge of significance 
for the fractile group between the^80th and 85th percentiles, 


PERCENTAGE OF HOUSEHOLDS 


=> 
[TT n 


$52  00:0)000xx G (2) 
COMBINED 0—0 60,2) 


m 
wv 


о 


ANDHRA мо 
551 ө----........ G() 
552 x—.—.—.—xd0). 


COMBED x— — — —x 012) 


AVERAGE SIZE OF HOUSEHOLD HOLDINGS (ACRES) 


AVERAGE SIZE OF HOUSEHOLD HOLDINGS (A 


PERCENTAGE OF HOUSEHOLDS 


Figure 7.1-7.2-7.3. National Sample Survey of India, rural. 
8th round: July, 1954-April, 1955. 
Household ownership holding. 
E. ru ре ы: to до а step further and plot the average size of holdings 
ql eee s ER between 90 and 100 per cent This is shown in Figure 
BEL iud рата! m; even for 1 per cent groups, beyond 90 per cent of the 
; out the separation becomes significant only at the level of the top one 
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per cent of households. In considering these results, the extremely small size of the 
sample, only 18 villages in each sub-sample, must be kept in mind. The flexibility 
of the present method can, however, be easily appreciated. 


4. COMPARISON WITH FREQUENCY DISTRIBUTIONS 


4.l. It is worthwhile to eontrast the present approach with that of the 
frequency distribution of the usual type in which the class ranges would be fixed, 
say, in terms of the money value of the per capita expenditure (such as Вв. 5, 10, 15, 
20, өбе.). In economic data, the money value of the expenditure would, in general, 
change with changes in price. The population of rural households as a whole in the 
7th round may, of course, be compared in a meaningful way with the population of 
rural households as a whole in the 9th round. А fixed range frequency class would, 
however, represent different fractile groups in the two rounds and would not, therefore, 
be comparable in any important sense. The use of the same fractile groups would 
avoid conceptual difficulties involved in making comparisons over time or space 
or for two populations differing in any way. In the fractile approach, the bottom 
10 рег cent or the top 5 per cent, etc., of households іп one round of the survey can 
be considered to be the counterpart of the bottom 10 per cent or the top 5 per cent, 
etc., of households in another round of the survey. This would also be true for com- 
parisons between two States or two geographical regions for the same round, These 
fractile groups, in other words, may be treated as so many economic “strata” of the 
whole population and comparisons over time or space of the same stratum would be 
meaningful for many purposes. 

4.2. The contrast сап be expressed in a slightly different way. In using 
fixed ranges (with varying frequencies) the main interest lies in the pattern of the 
frequency distribution as a whole. In using equi-frequency fractile groups (with 
varying class intervals) it is possible to use each fractile group itself as a stratum or 
unit of comparison. The effect of price changes may, for example, be different at 
different levels of consumption expenditure, that is, in different fractile groups, and 
the picture for the population as a whole may become blurred. In the fractile method 
it is possible to study the effect for each fractile group separately, that is, to break 
up the whole spectrum of the range of expenditure into smaller and more homo- 
geneous groups. { 

4.3. The fractile graphical approach offers an extremely rapid and practical 
method of analysis of statistical data of all types on a large scale. It is being used 
extensively now in the National Sample Survey of India. It would seem desirable 
to explore the possibilities ot its applications and its usefulness in other fields. 


Тп conclusion I should like to express my thanks to my colleagues іп the 
Indian Statistical Institute who helped in the preparation of this paper. 
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Appendix 
A METHOD OF FRACTILE GRAPHICAL ANALYSIS 


The theoretical basis of the fractile graphical method is briefly explained in 
this appendix. 


1. THEORETICAL BASIS 


1.1. The error associated with the combined fractile graph G(1, 2) is defined 
as the area lying between the two component sub-sample tractile graphs G(1) and 
G(2), and can be measured on the chart. This may be called е for the first population 
and e’ for the second population. 1 


1.2. For a non-graphical (that is, purely numerical) method of analysis, 
it is possible to define the error in either of the two usual forms, namely, (a) the sum 
of the differences (neglecting the sign of the difference) between the values of y for 
the two sub-sample graphs G(1) and G(2) for all fractile groups, or (b) the sums of the 
squares of those differences. All three were studied by model sampling experiments; and 
it was found that all three had similar distributions. The graphical definition has been 
selected, however, because of its simplicity. Once the accumulated totals are found 
(as explained in paragraphs 1.3—1.9 of the text) all the graphs can be easily drawn, 
and can also be interpreted directly by visual examination. In fact, junior computers 
can quickly learn this method of analysis, which makes it possible to use it on a very 
large scale. Secondly, the graphical method shows to what extent these two graphs 


cross and recross each other, and can sometimes reveal whether systematic non- 
sampling errors are present. j : 


1.8. Тһе combined fractile graph G(1, 2) will usually lie partly within and 


partly outside the error area e; and may occasionally lie entirely outside the error 
area e. 


1.4. The separation has been defined as the area lying between the two 
combined graphs G(1,2) and G'(1, 2). 


1 ie presence of дш non-sampling errors, of course, makes it impossible to give a rigorous 
mathematical theory, but this difficulty cannot be avoided as it is inherent in the method of a sample 
survey itself. 
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2. RESULTS 


2.1. A number of plausible results can now be stated.2 Let N be the total 
number of sample units, g the number of fractile groups, and n the number of sample 
units in each fractile group, so that N = gn. 


2.2. The combined graph G(1, 2) would tend stochastically to lie. entirely 
within the error area с as n increases when g is kept constant. 


2.3. The error area e would tend to decrease inversely as 1/ s/n when g is kept 
constant, or as 'V/(n,-I-ng)]n4ns if the sizes of the two sub-samples are respectively. ny 
and n, in each fractile group. 


2.4. The error area e would tend to increase proportionately to g, the number . 
of groups, when n remains the same (or when n, and л, the number of sample units 
for the two sub-samples in each fractile group, are kept constant). 


2.5. Since № = gn, it follows that the error area e would tend to vary as 
09% when М is kept constant. If g is changed, and the new value of g is k times its 
original value, the relative changes in the error area would be approximately pro- 
portional to 18/2, This is a most useful property as it enables one to use different 
values of g in testing the significance of the separation. 


2.6. It is plausible to assume that the error (say, Е) associated with the 
“separation” (вау, S) would be given by Æ? = (е)?-+(е’?; and that (S?/H*) would tend 
to be distributed proportionally to 4? with g degrees of freedom. 


2.7. When z and y are both random variates and are also statistically 
independent, the number of intersections of the two sub-sample graphs 0(1) and 
G(2) would tend to be distributed like “runs” of heads and tails in g throws of an 
unbiased coin. 


2.8. Тһе above results would remain true for any set of linear and nonlinear 
transformations of the values of x and y in all the sub-samples. 


3. Боми SUGGESTIONS FOR FURTHER WORK 


31. The above results are, of course, only approximate, and the exact 
results would depend on the statistical distribution of both > and y and on the relation- 
ship between the two variates. For example, if n and g (and necessarily №, the size 
of the sub-sample) are kept the same, then the error area € for sampling from a popu- 
lation in which there is high correlation between y and a would tend to be smaller than 
the error area for sampling from a population in which y and g are less closely associated. 


2 These were first given in lectures at the Indian Statistical Institute in Calcutta in April, 1958 
and at Berkeley, Chicago, and East Lansing in the United States in May and June, 1958. А preliminary 
note with the results of some model sampling experiments was published in the Transactions of the Bose 
Institute, Vol. XXII, 1958, рр. 223-230: “А Method of Fractile Graphical Analysis with Some Surmises 
of Results." Some further observations were made in lectures in Tokyo and Kiyushu, Japan in November 
and December, 1958. 
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3.2. Many model sampling experiments have been carried out, and the results. 

are generally in agreement with the above conjectures within the margin of errors of 
sampling. Some of the results of the model sampling experiments have been published : 
other results. are in the course of publication. The model sampling results prese: 
interesting suggestions. -For example, the error area e may vary proportionately to 
(0—3) rather than to g, when % is kept constant. Тһе number to be subtracted from 
g may not be exactly half, but some such correction factor would probably be necessary. 


| 3.3. Results of model sampling experiments can show the plausibility of t: 
conjectures but can never prove them in a theoretical sense. Theoretical investigation 
are, therefore, proceeding and some asymptotic results have been obtained which are 
broadly in agreement with the conjectures given above. A rigorous theory would 
open up many possibilities 


| 
| 


ON SOME PROPERTIES-OF ERROR AREA IN THE 
FRACTILE GRAPH METHOD 


By K. TAKEUCHI 
Faculty of Economics, Tokyo 


SUMMARY : Some of the "surmises" given in Professor Mahalanobis’ lecture delivered on 


22 November 1958 at Tokyo University are investigated and proved. 

In section 1, the expectation and variance of the “error area" 
„› д) be normally distributed. In section 2, 
mally distributed. In seotion 
and the computational 


This paper consists of three parts. 
a(1, 2) are computed, under the condition that y; (4 = І, 2, .. 
under rather general conditions, it is proved that yj's are asymptotically nori 
3, special cases where с апа y are bivariate normally distributed are investigated, 


results are compared with the experiments. 
1. EXPECTATION AND VARIANCE OF THE ERROR AREA 


The definitions and notations used in this paper will be same as in 
Mahalanobis (1958) which is much the same as in Mahalanobis (1960). 

We first consider the error area between i-th and i+1-th coordinates. We 
shall call it the i-th unit area and denote it by А,. We assume that the distance 
between two coordinates is equal to 1. 


We put и = 9—9 
v = ўил 


and assume that w and v are normally distributed with means 0, common variance 


07, and correlation p. 


Then, 
А; = ы ш> 0 
Mt 
Cp tt 
Thus, г 
oe o 0 
Е(А;) = | | а plu, ОЛ | Еске || plu, v)dudv-+- > 
оо -0 -0 К 
Г pu ou + |” дщ діші 
zr u, v)dudv т p(w, v)dudv 
toli Gern? } SRI Ten 
o w 0 е ТИЕ 
= ( f atop де + | | p(u, v dude 
о 0 ОЕ 
where plu, v) denotes the density of bivariate normal distribution. 
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Putti r cos Ө = we Жыр Gee 
ү М?(1+Ер) У?(1—р) 


a= tant yie, 


we obtain 


о а 
f | (u--v)p(u, v)dudv = i | A/3(14-p) cos @ . r?e7 w ded 
оо 


эп 


o g 


c тул an a= ALEA) 
Мт Эл 


Мл 
0 о wey? rie ы 1 т-а ёа ) T! is 
w p Ass p) сов? 0--(1--р) sin ЖАР 
4 1 TEES plu, v)dudy uoi ( | Ураз a p2e-F9/208 drd 
= ме x УЗ, EY e 
Gagan stare. над) а 


00) ju УВЕ VI-p. "pr 
2Ул(1-р) Vite Ум 


L[ le V2 +Vi-p, 1 
MA) = {ee log ee: + 55 \o. 


Therefore, (А) “(жур (14-4/2)4- JE )o = 0.64750 when p = 0. 


(4) NE с = 0.79797 when p=1. 
ЩА) = Vt = 0.39807 when р —1. 


eai plu, v)dudv-+2 a T wte е plu, v)dudv 


0 ә o | 2(u—v) 
Her. r 
=a А 1 сов20 тЗе-%з ф40-1- 
әз! | Aer) 6089 EA p віп) s 
snot | | (1-р) іп Y кесте 4ға0 
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M 
Eus p | 0820 40 +-—____ (pr + 4p? sin? 0—4p(1-F-p) ) 46 


4mi ЯТ у] ТІ 81120 


214 
=A alg рөлде л (йер cot азарт 22)-- 


п at -р) 


+2p? sin 2а—4р(1--р)(п—2%)} 
= o*(1+p) oed TEP М1— )+ 
п Тр ze WES 
ИЕ ra Бу {ее (2-56) ("2 tani / L8 £n 227 1:2) 
ТУА) = E(41)—U(A9)* 
When p=0, va) = (+ = оз (Ау = 0149103, 
Thus when we designate the error area by а(1.2), we have, 
0-1 
Е{а(1.2)} = E E(A,). 
But to compute the variance of a(1.2), it is necessary to compute the covariance of 
A, and Ал, which will be done later. 
1 we alter the definition of the area, and put Aj; = JUI and write 


0-1 
а(1.)- X А; the distance area, then, 
ізі 


Е(А;) = E g = 0.79790 
E(A?) zm j (elie lel) (и, v)dudo 


-: [LOS pu; ОЗ SER. ЕЛІ 
9 


= sit | cos? 40 pe dr + р [| віп20 40 [perma 
=F [aene 75 =) ға- E = ide )} 


2 z (1-0) +2pa-+sin 2a) = 21052 nnper / E =A}. 
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In general, z Е(А;) > (А) 
алас ; ИА) > V(4;) 
But when р = 0, E(A;) = 0.64750 
- V(A;) = 0.14910? m (1.3) 


Е(А;) = 0.79790 
V(A;) = 0.18170. 3 


_ So that c.v. (А) = 0.597 > c.v. (Aj) = 0.535. 
And whenp=1 always ay AX. 
If we assume that cov (f, gj) = 0, із). 
We have correlation р, А4.) = 0, 122 
' 2b j=) 
3 а? Е 
And if we assume УФ = 5 @ = 1,2, ... ). 
We have, V(a'(1.2)) = V (5 4’) 
Hd 3-1 
= XV(AQ4-2X соу (4;4;) 
= V(A')(2g--3) 
ол. (a'(1.2)) = J ох. (A’). Te (1.4) 
7 
‘Similarly when V) = 7. cov Gi, g) =0, бе): 
we have p=0 and (4,4,)-0, j > 2. 


And BA, Ans) = | | |4,4, рыбой 


DUE 11 (еее) p(u)p(0)p(vo)dudvdeo 


+a 


C [utoe 
[expe yf mop dde 


HT (uA vy Ew?) 
о о 4(v— НОЕ 
=ЗА-Ь-Ы, (say). 

У . 68 


+ 
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i 
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S SB 


-11 | (#-Еиш-Еюю-Еш)р(и)р(и)р(ф)р(ш)йийъйь 


{+ [| powt & | || wp(ayploydude. 
209 7276. 


| 


2 (1+5). Qs) 


srl к. подр | (д 


=} | [ (= "EET w) «(22+ ) plv)p(w)dedw. 


By the transformation :v-—rcosÜ w= увіп Ө, we havo 
v т]2 00 e 
L=} | | a (2 т-у сов 2 жа pio PP dy 

40 оз T cos 0 40 


пу вби й Г i i cos 0+ sin 0 


= (3 8 tvt T 1) 


I, is difficult to compute exactly, but 


а-я) 
z | | ЧОТО ЕЕ): plu)p(v)p(w)dudedw 


0 


024-12 + 
"s ] * gp) peoydudvdw } х 


( pe Т plu)p(o)plwdudvdw \' 


2—38 Ronn RE 
2—8 — 


со 00 
а рвы» 
оо 
|В. со р 
1 1 —т/о? а? 
= | арра 70-6 
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_ And by putting v =v, и = v tang, ш = v tan y, 


z[2 [2 оо 
eu . 8008 ф вес? у | 43 sec? 2 
sy d $$ (1+ tan 914 tany)” “°° dul 
- (1+ tan? @ + tan, y) n? 
хе 207 дафау 
е seot ġ sect y dødy > 
Ат о à (1+ tang) (1+ tan y) (1-+ tan? y- tant у) 
x адау : 
4m 4, 1 (I+ tang) (1+ tan y) sec $ sec у 
тј 1 
803 сов? 46011 
қарады JN MN 
= llog 0y), (1.6) 


АЗ.) 
а Уш log (-- 3) а. dog (1+ /3))* } o? 


> 244.) < (2. ыж №8 (+ v2) } - 
that is 0.452402 < E(A,A,,,) < 0.485602, 
Consequently, cov (Aj, А) = E(4,4,,,)  (E(4))* 
< 0.066302 
> 0.033102, 
0.222 < ДА, А) < 0.445. 
This i is not а good ap 


каН; but is enough to see that p(A., A;,,) is smaller than 
better approximation, we have to 


By putting РА, Pies =, 


; V(a(1.2)) — V(AY(1 --2*)9— (1.1499). 
For large g, we have approximately 


e.y.(a(1.2)) = E ev.(A) < vs x 0.580 > у: 0.507. ... (L8) 
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Remark added in proof : While Professor Mahalanobis was in Japan, I asked 
him if the distance area may be better than the error area as defined by him in view 
of the fact that c.v.(4;) is smaller than c.v.(4;. The above shows that о.у.(а/(1.2)) 
need not be always smaller than с.У.(а(1.2)) because a small p(A;, A;,,) may 
compensate for a large с.у.(А,). : 


Remark: In the case when V(g,) + (+) by quite similar but somewhat 
lengthy calculations, we can have analogous results as above. 
2. THE ASYMPTOTIC PROPERTY ОР J; 
We assume the following model : 
y = Ё(у|®)-Е = /()--7 
We put 2= f(x) so that y = 24-9. 


Assumption 1: y is distributed independently of x (so of 2), and has the 
second order moment 02 << оо. р 


Assumption 2: f(x) = E(y|v) is a monotone increasing function of v. 


Assumption 3: 2 has a continuous distribution with density ¢(z) with a 
derivative and the second order moment. 
Suppose that we have n = 7/0 observations (x, y,), (xa, ya) ... (Ens Yn) and 
where а, < 2, < x, <... За, so that we obtain 
(21, 171), (2a Na) ... ns In) where A < £y € Shp 


Tee 
We put й = ” A Ума-1)+) 


x 

- = > (иены Ев) 

7/ 3-і 

and denote the same by z+. 

By Assumption 1, ғ; and 7; are independent and when n’ is large, 7; is asymptotically 
1 4 i 

normally distributed with mean 0, and variance O ( =) . То prove the asymptotic 


normality of а, we utilize the following : 
Lemma: Let (ау, Ё), (тә, 2)... be а sequence of pairs of random variables, 
If, the conditional probability distribution of x, given E, = Ё tends to а normal 
1 
distribution, with mean ШБ), variance = сЕ), for every value of Ё; Е), alë) are 
continuous functions of and have second order derivatives; and £, tends to normality with 


mean тыз! ei. Then, т, is asymptotically normally distributed, with mean p, 
5 т 


variance o*, where 
n = А) 
a? = (E) È) o. 
ті 
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Proof: Let fa (t) = Efe“ =}. 


By assumption, 
Biel |) = exp | inéd — E җы +0 (2.)] 
= ехр | in(E)t + ip OE, — 8 — x (E+ 
+0(& — $5) -o(1& — 6) 9 (7.) 
= expli (by — ж exBex 


x exp |6, — 8) +00, — 65 --o(2&, — 5)»( 7). 


st) = шей 5) = exp [цу — 2. eb] x 
x E(exp[ (ОКЕ, — ® + О — 65 -0(26, 0) (7). 
= коб — 3. obe exp |--O ote + 0(1 )] 


-- exp (шау -e + (WER + o (1) ; 


which completes the proof. 


Similarly, let (2, Ех, 74)(z3, 2, 7з)... be a sequence of random vaiables such that 
the conditional probability = т, given Ё, = č and 7, = 7 tends to anormal distribution 


with mean j£, 7), variance — 1 оң, 7) as л-эоо, where д and о? is a continuous function 


having second order Кы and that (£,, 7,) tends to a bivariate normal distribution 
NG, 7; 03, 0%, он); then т, is asymptotically normal with mean ,(é, ӯ) and variance 


o= 1 162, й)-Е(д{(#, 3)? ©?--(д.(#, 2) оў--2д(#, )u,(, 3) сы) 


sie 


where Е, 1) = 2 Е, n) | Е 


тіпт 


D ODE Fe mle я) 


EST КЕ (е) 
72 
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ON SOME PROPERTIE OFER OR АВ) А IN THE FRAOTILE GRAPH METHOD 
First we shall. show the asymptotic КОШУ of 21. 


Be СЕЗІ TERM 


21 
т’ 
and given г, = б mz, is. equal to the sum of constant ¢ and ж--1 “independent 
variables, which have the same distribution P(z|z < t) with density 
ос 
| oo: 
-0 


and since this probability distribution has the second-order moment, 2, is /condi- 
tionally asymptotically normally distributed given z, = t, swith mean and variance, 


t t i 
| беде [ ед0) 
ид 2, ар 0 ) 
i godz | | dea 
АК — © 


Апа since 2, can be regerded to be a га fractile, it is asymptotically normal 
| о - #0) ) ‚УТ а pm & is the? = = trice of the population of 2, (Cramer 


(1946)). D | 
Then, applying the above ЕЯ is asymptotically PEE with mean, 
u= ша) 


and since 


с 
| д=#\ї—д@) | gedz 


у = Pie = eoo 


{ | вое ү. 


: ей 
= 06—06) 
the variance is | 
aa tiit ГЕ. 3t Eo 
| = (met ама? jo Ide Qu) 
73 
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As for 2,4 > 2), considering first conditionally given 2,4 іі 2162,1, then by utiliz 
ing the fact that z,/(; 44 and 246-і are asymptotically bivariate normal, we can show 
in quite a similar way as above, that 2/ are asymptotically normally distributed 


with variance 1, с? and о? is expressed as 
а} ras G+ 109 (Eas) 


4 сес (ша, 6) Sia)? + 


oats О а, SMMC. Q—6- o 07 
6425 74 
| 2800 н 

where Wo 5) = 8-2 = | аф ede 
| ф (42 i-a 

ы; 5 Sim 

S35 d н т 
| ems Fé y 

: 965 6) = Oe —— = 9 | е-и CY à Oe. 
| фе) “> 
fii 


Thus we obtain : : 
Theorem: Under Assumption 1—3, g's are asymptotically normally distributed, 


1 + = 
L (о. | E 


with variance 
e Ey : This is an immediate “consequence of the asymptotic normality of 
б ie 


ds niei = above assumption 2 is taken only to simplify the calculation; 
otonieity of f(x) is neithi : É 

VISA E (е) ег necessary 5% essential. Assumption 1 also may 

: f» oos er in proof: А generalization was made by Prof. Kitagawa in 

(connection wi: ese assumptions; he showed that the additi Du. 

tial for the b s vir ae ed-ur e additive model of z is inessen- 
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3. NUMERICAL EXAMPLES WHEN 2 AND y ARE. BIVARIATE NORMAL 
We consider the case where x and y are bivariate | normal 


Nt, y, ть ow ра) 
Жеш 2- % рые Ha) + Hy 


WESS 
then 2 and 7 are independent, normal, with variances, 
_ д-р à= (ph 
We shall take up the сазе of model sampling experiments in the Indian Statistical 
Institute (Mahalanobis, 1958). 


In this case . 
1 
1755 Шу 0} 0—1, y= 3, Ра = ж 0% = 01 = 1. 
(1) The case g = 2. According to the notation of Section 2. 
(rue 4/9 sat 9; 
05% 40-2. ME 21 MORI 
Ву (2.2), we obtain, 


vey = 2 (( (кеп 2 


Y) = Yi) V9 = „(2— 2) 


Obviously, V(a) = Vee)» V) = V3 
And Binoe DES > ха = 22 (total sample mean) sn | 
AY 9 
Иа) = ЗВ) eov (ғ &) = 4700) =. 
ты 1 1 
Consequently, соу (2 22) = zr (i-=) = an 


cov (0,0) = COV (а › 2) = PE 


_ ғ, 0) iT : = = 9g—1 
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Using the notation of Section 1, 


, Ж 1 2 
of = 0} = пФ) = 26) = „(4—^) 


cov (и, v) = соу (Jii 9—98) = 2 соу (Jy. Jo) 
Pus = Pr, Jo) = 0.1893 


fp xm 1.834. 


Then by (1.1) we have 


E(a(1.2)) = B(A,) = a 1.246. 
And by (1.1) and (1:2) . 
e.v.(a(1.2)) = o.v.(4,) = 0.627 
which do not contradict with the results of the experiments in Mahalanobis (1958). 


We state here some wellknown results (See Kendall and Stuart (1958)) which 
will be used in the remaining sections. 


Let $( = m eB Ф () = І 
We have for # > 0 
Es et тер 
E E cet EST) RUE + R,(t) 
where | R,(t)| < 1.3... (2r—1) . 2+0 1, (8.2) 
00 < CONTE 


since the exression on the left hand side is monotonically decreasing. 


(2) The case д 2 4. In this case also we can compute the variances and 


covariances of 2, and 7, Since the computations are rather lengthy, we shall give 
upper and lower bounds for these. 


Let : 2<:<% 


Vs) < V(55-541) = m © {Н imd = ш) 
т 2 
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From (3.3) we see that 


nog—i+1) | | 
Иа) < Tv quo) s (84) 


Next y ( Zito- ) Say (n= а? 
2 gn 
since Sp is an unbiased estimator of и, of which z is the efficient estimate, 


н 28 
ence V(%) > а (3.5) 


where р; is the correlation coefficient between 2; апа 2, (3.4) and (3.5) furnish 
upper and lower bounds for V(%). 

(3) The case when g is large. We first show that n'V(z) tends to zero 
uniformly in i as у со. Let 2 <i < 0—1. 
From (2.8) we see that 


ot < Ge 00 299-3, бери) 


For 2<i< 0—1, wo have 
1 н 
= [49% > 6-65-09 (6) 
fi 
fi pl COM 
TT MUT) m < ет 
by an application of (3.1). 
Б 2 > 
Thus о < far s. (8.0) 
If = g we have from (2.3) 
© — 2120" 9—1 
M ES 2 LA ARE, Mieg Edd р} 
Ona. | z NT 4г—[у+ 7 (ш-) 
У 27 ,-4/2 
whore Ig =O = Va e tl о? 
© ео» 
We also see that | 4 | ж ing. ———@ = py bo HP. 
И 
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Using (3.2) with r— 2, we seo that 


const > Y T) 
È< Ea 2 . 2 (3.7) 


We further note that c? = 02 4,4. - 
We easily see that for a fixed k and any в > 0 we can find a G such that fo: 
а д> 6. . 


k 
o<cif2<i< LIE — <i «€ 0—1 irom (3.6) 


vi vg 
ack o. <і<4- d from (3.4) 
vg vg 
and оў <e ifi=1 or g from (3.7). 


so that с» 0 uniformly in i as g ©. 


From this we see that n'V(g) = n'V(z +1,) 0* uniformly in i and 
сог (4/n'V (Gj), /n'V(g,)) 0 uniformly in i and j, so that, from (1.3) we have 


pe: n zu аа 
E(a(1, 2)) = 0.6475 xox = i ra 23 (3.8) 

And, using (1.4) we have 
c.v. (a(1, 2) =? хо. 2. (3.9) 


Both these relations аге in fair agreement with the model sampling experiments. 


Tam much indebted to Professor Masuyama for encouragement and advices 
and to Mr. M. Shibuya, the discussion with whom was very suggestive and useful. 
Tam also grateful to Miss Т. Hayashi and Miss Matsui for preparing this manuscript. 
(9.5) 

REFERENCES 
Спамів, Н. (1946): Mathematical Methods of Statistics, Princeton University Press, 369. 
KENDALL, М. С. and Sruarr, A. (1958) : The Advanced Theory of Statistics, Vol. I. Distribution Theory, 
Charles Griffin and Company, Ltd. 
ManaraNonis, P. C. (1958): Lectures in Japan : Fractile Graphical Analysis, Indian Statistical Ir stitute. . 
(1960): А method of Fractile Gra: 
Sankhya, Series A, 93, 41-64. - 


phieal Analysis. Econometrica, 28, 2, 325-351, reprinted in 


Paper received : June, 1959. 
Revised : March, 1960. 


78 


"SOME LIMIT DISTRIBUTIONS CONNECTED WITH 
FRACTILE GRAPHICAL ANALYSIS 


By J. SETHURAMAN 
_ Indian Statistical Institute 
SUMMARY. Fractile graphs were introduced by Professor P. C. Mahalanobis (1958), (1960)* 
tor describing certain features of bivariate populations, ‘The problem is to make useful comparisons 
between two bivariate populations wherein the values of one of the variables, suspected to -be inflated or 
deflated by measurement, is omitted and its rankings alone are taken into account. Professor Mahalanobis 
(see Mahalanobis (1958)) introduced the error area as а measure of divergence between two fractile graphs 


and made some surmises concerning its behaviour, In this paper we introduce other measures of diver. 
genoe and investigate into their limiting distributions in the light of his surmises (see Mahalanobis (1958)). 


1. INTRODUCTION 


11. We deal only with some theoretical aspects of: Fractile Graphical 
Analysis. In our next note, we hope to deal with testing and other practical aspects 
of Fractile Graphical Analysis. 


The summary of all our results in this paper is given in Section 3 after the 
definitions and notations are dealt with in Section 2. 


2. DEFINITIONS AND NOTATIONS 
р 2.1. Let (yy 2), ++» (Yn tn) be а sample of т independent observations on 
a random variable (Y, X). Let n = mg where m and g are integers. Rearrange 
the sample thus 

(Jay v) э (Yous Eim) 


so that tay [Stia <... <Я. 
10% 
з ==. D ды; іле 17.0 tee (214) 
Let ША ЕТТУ СЕ 
4s $ 
y= ie У (yw. 
fred Umso Са 


` Thus from a sample we get а sequence (v3; Ко ш). Plotting these against the equi- 

distant points 1,...9 and joining the points spocessively, we get a curve G known 
as the fractile graph of the samples. 

2.9. In the problem we are to discuss we will deal with several samples, 

Each sample will bear an index, ‘The statistics from a sample with index (i) are 


? was first introduced іп a lecture аб the Indian Statistical Institute, 


* Fractile Graphical Analysis о 
Calcutta, in April 1958 and developed at Berkeley, Chicago, and at other places in USA in May and 
2. n 


June 1958, апа later on. in Japan. 
19 


'SANKHYÀ : THE INDIAN JOURNAL OF STATISTICS : SERIES А 
derived from those of the unindexed sample by affixing the index (i), for example, 
G denotes the fractile graph from the sample with index 1. 


Let (yis =), ... Qo ж) 
(yi, ай), ... (у, ta) 


be two independent samples from the same bivariate population P!?. The pooled 
sample is (912, 212), ... (уз, 222). The sequences (vj, ... 01), (vf, ... 02) and (rP, ... 017) 
are easily defined. (Instead of m we put 2m in the last case because the sample is 
of size 2n). ? 


Let (yt, ай)... (уй, 23) 
(yt, ай)... (уд, 23) 


be two independent samples from another bivariate population P%, Samples (3) 
and (4) pooled together yield (y{*, 24), ... (уь 22). The sequences (4), ... v3), 
(vf, ... vf), (015, ... 84) are likewise defined. 


. 2.3. Тһе error area A,, for the two samples (1) and (2), is defined as the area 
_ ; between 01 and G? bounded by the ordinates at 1 and g. Аз is the error area betweer 
samples (3) and (4). Тһе separation A, is defined as the error area between the sample: 
(12) and (34). We now give algebraic expression for the error areas. 


Let ш— = was) 
00 = Wy t= 1,...9 wee (2.8.1) 
32—08 = Wile): 


‚ The area between G and G? between the ordinates at i and 14-1 is (we will omit the 
subscript (12) for typographical convenience) 


Hurl tpw] if wes > 0, 


ie, if the curves do not intersect between i and i+1 


AD. Loc puel ў D 
3v; --3 она | ІТ Ей тар if wav. < 0 ... (2.3.2) 


i.e., if the curves intersect between i and itl. 


Thus 
2-1 
Ав = 24 ш | +4] ныз |- 9s шл) ыг } 
* * 1+1 
eon S | б (2.3.3) 
where - с O(a, 6) = | Шы»; 5 Е 
; 1 ifak<0 
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2.4. A new statistic A which may be called the norm statistic and which 
may be used instead of the error area is defined below. For the curves @ and 63 
the norm statistic A,, is given by 


| ; Аш = Via 2. Е we (24.1) 


Ase is the norm statistic between G? and 04. A, is the norm statistic between 012 
and GÀ, Л? із a particular case of a more general class of statistics that can be defined 
as soon as a positive definite matrix B = (b,;) is given. Thus for the two fractile graphs 
Gi and G? we define 

T$ Су т; ja») jus Pg: 2. (2.4.2) 


Similarly we define Г for @ апа 64 and T? for 012 and 054, 


2.5. А, А, Г are all, in a way, measures of consistency between two samples. 
These statistics may also be used for testing in the two sample problem. Тһе ad- 
vantage associated with these statistics is that they provide a comparison between the 
two samples over the whole range. Ап inequality that соппесін the norm statistics 


A with error area А is 
^ aod 2 
=< А € Avg. SEO 


2,0. The analysis ean be suitably modified to yield concentration curves and 
ation ratios when X and Y are positive, thus : plot the points 20 = 0, 


concentr 
“= Mt + — 1,...g, against the equidistant points 0, 1/, i= 1; -9 and join 
vH +H а 


the points successively. ` We get а curve C known as specific concentration curve 
when the variables У and X are different and as the Lorenz concentration curve when 
tio is defined as the twice the area between the 


y=X. У, the concentration та 
curve C and the straight line joining the points (0, 0) and (1,1). The algebraic 


expression for X is given by 


4 
9У2%% 
ee BEM ы 
П g 
gÈ t f 
1 


3, SUMMARY OF RESULTS 


der some mild conditions that the asymptotic distri- 


3.1. (a) It is shown un 3 1 
2, = Ry isa mixture of y? distributions, which we know can be арргохі- 


bution of mA; 1 
mated by a single y? distribution. 


A с г } 
(b) For a suitably chosen B, namely Us] defined in equation (5.3.8), 
mI? = Qs i$ asymptotically distributed as a д2 distribution with g degrees of freedom. 
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(c) It has been shown, in general, that ЕДІ?) = 2 and in саве (У, Х) is 


constant ху I oni 
distributed as a bivariate normal, that DIDI EL mm where == mean 
‘approximately equal to’. Takeuchi (1961) has shown that-when (У, X) is bivariate 


normal, BA) = лме Ху, It is also shown that, if g is fixed then Z(A?) and 
т 


E(T?) tend to zero at the rate of x , and from inequality (2.5.1) that H(A) tends to 


zero at the rate of а 54, 
: Ут 
(d) We have seen that m A2, = R, is asymptotically distributed as a mixture 


of y? distributions. 16 is shown that Ry, and №, = 2mA? are asymptotically indepen 
dent so that 


Bi FEJ ЗА) 
Ба) АА 
is distributed asymptotically as the ratio of independent mixtures of X* distributions. 


This distribution can be approximated by an F distribution. Also Qi; and 
Qs = 2m T? are shown to be asymptotically independent so that 


с Малі Е. cA 
Qt Qaa Dit T$, 
is distributed asymptotically as an F distribution with g апа 20 degrees of freedom. 


We will incidentally derive that the combined fractile graph G!? tends to lie wholly 
within the two fractile graphs G! and G? as m— оо. 


(e) Let r be the number of intersections between the curves G! and G°. 
ndom variable and may take any value from 0 to g—1. Е 
that 7 behaves as the number of changes in the ‘runs’ of hi 


an unbiassed coin. It is proved here that this is not tr 
of r is derived. 


ris à rai irst it was surmised 


eads in indepedent tosses of 
ue. The asymptotic mean 


(f) It is shown that the concentration ratio X is asymptotically normal. 


4. А USEFUL THEOREM 


4.1. In many cases when we are interested in the 
of a pair of random variables (У,, X. 


6.(е) of X, converge in law to G(x). 
(У, Xn) namely 


Ну, v) converges in law to | СО (4.1.1) 
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Theorem 1: The assertion contained in equation (4.1.1) is true if the con- 
vergence of the marginal distributions is slightly stronger, namely, the sequence of measures 


It (E) corresponding to G(x) converges on every Borel set Е to the measure ЩЕ) corres- 
ponding to G(x). 


The proof of this theorem and other related theorems under investigation 
will be published elsewhere. 
5. SOME LIMIT DISTRIBUTIONS 
5.1. First we develop some more notations. 


Ғ(у, æ) is the distribution function of а random variable (Y, X). Ме assume 
that this distribution admits of a density function f(y, 2) which is continuous. The 
means and v of X and Y respectively are finite. 0%, po y y. o% the variances and 
covariance of X and У are also finite. 


The marginal distribution function of X is Ga) with a density function у(х) 
which is continuous. бу, 4, ... 1, 0, are defined by the relations 


G(0;) = 2s і4-1,.54-1, 
(02) 7 

б =—0, 0; = +00. ve (51.1) 
We define ((0) = 0, (6) = 0 and assume that 0(0;) 40,7 = 1, 9—1. 


We next define the truncated means, variances and covariances as follows 


Р 6i ; 

= [Г ®@Чт; і--1,..4 (5.2) 
64-і 
bi Е 

weg: 1 yf (y, ®)Чайу; i= 1, ... 9 ‚э. (5.1.8) 
OA ee 
6i Y 

cing Г вм ee i= 1,220 ; va (5.1.4) 
Өї-1 

тер] Ты, му; i= 1, il (15) 
Са 
[n x Т. я = 

рош =й [ f (00—070, бау i= eee (5.1.6) 

ое Е = 


We define some ‘more quantities 
E ^ М0) = E(Y|X = 0) ; $2ld.g-—l ME GEM 
Np) = A(0;) = 0. 
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: 5.2. Let (уу, 21), --- (Yn За) be n independent observations on the rando! 
variable (Y; X) and rearrange the sample thus, 


War а), --. (Ушу Tim) 


so that Lay € ©... S Y. 
Assuming that n = m.g where m and g are integers, we define 
ВЯ ican : E. 
с=з ҢҮ. v 2 іше 1;...@ we (5.2.1 
M r-(-1)mgl Зи De T Гала п 


мт(щ—щ) = k(n) 
{ 

мт (v,—v,) = (т) 
Уату 70) = = Щз; i-1..g—1 


біп) = &(n) = 0. 
. Also, let $ т and ӯ be the sample means, 
Vn(2—p) = Е (n) 
(5.2.3) 
A/n(g—v) = n(n). 


5.3. The main theorem of this section is 


Theorem 2: The distribution of a(n) = (т.п), ... qun) tends to а mulii- 
variate normal distribution. 


Proof: To prove this theorem we proceed as follows : 


. We know that the distribution of &(n) = (G(n),... C, ,(n)) ténds to a multi- 
variate normal distribution №, in the strong sense of Theorem 1 since the densities 
converge. For instance see Cramer (1946). We will demonstrate the 


Lemma 1: “Тһе conditional distribution of (&(n), (п) = (8а)... £ (п), 
(т)... Ig(n)) given t(n) = 5 tends to a multivariate normal distribution where the 
asymptotic means of (Ё (n), Ņ(n)) are linear functions of the Св and the asymptotic 

. variances and covariances are independent of t. 


An application of Theorem 1 shows that the distribution of (E(n), y(n), 
` &(n)) tends to a multivariate normal distribution. In particular the distribution of 
y(n) tends to a multivariate normal distribution. Hence the theorem. 


i Proof of Lemma 1: We prove the lemma for the case g= 2. 


Le c 1 am - ЖЕСІ +2 am) =m ) ] 


(5.3.1) 


18 (0) = ут (esa о "s | 
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As soon as & (п)іѕ fixed at бү the sample splits up into three independent parts—namely: 
m independent observations on the random variable (Ү, X) truncated to the 


region : —=<х<+ 4 


(m—1) independent observations on the random variable (У, X) truncated ` 
to the region . 


and one observation at b+ sas 


Thus (E,(n), 7,(2)) and (Em), 1т)) are the normalised means of the samples 
in the two independent portions. In virtue of the Central Limit Theorem the 
distribution of these random variables given G(n) = & tends to a multivariate normal 
distribution. The asymptotic means and variances are readily calculated. Let us 
denote the asymptotic mean and variance of a sequence of random variables, 2, 


by H(Z,) and V(Z,). We find 
BE (n) l&n) = a= А/2(6,-а Px) 
оти) = &) = МАО) 
TE ln) |ln) = &) = 01 ... (5.3.2) 
Рене) &) = \ 


Sick { { 
: соу (&(») mn) lén) = у= pirat: 
Similar expressions for (En), 79(n)) hold. : Ў 
We next show that (Em), та») and (Ёл), 2901) are asymptotically 
equivalent, i.e. ЕЕ (п) + Es) — ША tends to zero ав n 00. We have 
to show that 


1 
l Bim) + Em E(yqni3)* > 0 ... (5.3.3) 
m T 


Жата) = S Ez | 600) = &) 
-Е (0+ 8. ) <% 2. (8.34) 


Byin) = z Ey) = &) 


E. S ( қ) BET y) < œ 2. (5.3.5) 
ji a cr | | 


(5.8.4) and (5.3.5) are enough to prove (5.3.3). Thus the lemma is proved when g = 2. 
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A similar proof holds for any g. In the case of g groups we have 


F(E (п) |£(n) = 5) = 0,—n)g(036— VIO: 1—1)0(0: 1) 1 


Жш) = 5) = УА) уда Иудее ] A 
а) | 5%) = © =o? 
Penn) | £n) = 5) = 7? 
соу (< (n), y; (n) | tn) = $) = рой, 12: (6.3.7) 


cov (5 (n), & (n) | &(n) = 5) 

= cov (Е, (п), т (n) | (n) = 5) 

= Gov (n(n) y(n) | )-0-0 iz 
Let us denote an equation in (5.3.6) by 


Ё(т(%)| з) =) = tM where Misa g—1xg matrix. 


Let us denote the variance covariance matrix of £(n) in the limit by 


K = (ky). 
ky = (9—) — 1 


WU о ЕНЕ 
дро) 55 67 p 


Then the asymptotic distribution of 9(n) has for its density function 
const. exp — mw! 
E pite e 


"where A= (y) = SLM’ КМ. vs (5.8.8) 
Sic. H жеты Қана. 
Шеше М, 440) 926-1) Же iii) 
ў 4 ; D 5.3.9 
М-Н) ps 
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Using (5.3. 8) and (5.3.9) we can write A; as 


=; M, M; if j>i 


= ‚% М HAO) =A) 4-7? ifi=j 


Let us denote the correlation coefficient between у, and у by 2, i = 1, ... 


4-1. lf the regression of Y on X monotonic, we can easily see that all the cs 
aro positive. * 


6. APPLICATION OF THE RESULTS IN SECTION 5 = 

6.1. In Section 2 we saw that our statistics А, A, Г were all function of 

W (узу... 806. 
Define : Уча (т) = Ут Жаз) = Ут) 
= (п) — Nua (n). pee (0.151) 
Theorem 2 shows that the distribution of (Уи»(п)) = (Улаз(®), ... Ума) 
tends to a multivariate normal distribution with mean zero and variance covariance 
matrix 2A. Hence Ry = mA% = mE wis) = УХ аг) has an asymptotic distri- 
bution which is а mixture of 3? distributions, the compounding coefficients being the 
latent roots Ду, ... A, of the matrix 2A. Under the assumption that the populations 
РІ? апа РЗ“ are identical we see that А2, = Ry, and 2mA? = В, have the same 


asymptotic distribution, Taking B = Ain equation (2.4.2) we get 
TE г 
Ti; = Wan (= ) W (12) 


i.e." ‘ Qu = т = У» (п) = -) Paala) 


and this is distributed asymptotically as а д? with g degrees of freedom, Similarly 
Qoa = тГ and 0, = 2тГ? have the same asymptotic distribution. 


6.2. We also see that 


z =f | 2, (694 

Е (1) = (6.2.1) 

From ‘Takeuchi (1961) we see, that if ie 2) is distributed as a bivariate 
normal with parameters ду, #1 08, 01, P» then 7 - 3 Ан- @@ (1—03) as g— о. 


аз у 9. 


2-2 =. constant X f 
Thus ' ГЕ (А?) а Ges 


Тһе results. about the expectations stated in the summary follow. 
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6.3. We shall later on prove the following. 

Lemma 2: Ульи) and Yan) are asymptotically independent for any 
i and j. 

From this lemma it follows that R; and В, are asymptotically independen: 
so that 


Ry эд? 


Rath, А+ 


и Е 
2 


ds asymptotically distributed as the ratio of independent mixtures of distri 
butions. It also follows that О, and Q;, are asymptotically independent. Hence 


0295 = nm is asymptotically distributed as an F with g and 2g degree 
1271 за гіш 


of freedom. 
Proof of Lemma 2: 


аға) = ут —) = ERAS 
(6.3.1) 


Vj (2) = A/2m(v]* —vj*). 


Suppose we show that y/m ( 01° — Dad —0 in probability, i.e., y; (n) is asympto- 
tically equivalent to 
VE vm 09—30) 


=V 2 n) Нин) рат) — han). e (6.3.2) 


From the expressions (6.3.1) and (6.3.2) we at once deduce that Viag(n) andy (1) 
аге asymptotically distributed as a bivariate distribution normal with correlation 
zero, and hence are asymptotically independent. 


It now remains to show that 
an) = ут. (Р = utd) —0 in probability. 


or that E(o3(n))—0 as n—oo 
We assume that g — 2. 


. Then alm = быт in case abo Sadr 


р where 24 is the summation over all indices ғ such that im; < ad < ашыу and У, 
is b: summation over all indices such that 2) < <, Ме have 
$1 zn = ов in case Teman < а). The probability of either event 

. #rom these expressions for «,(n) we see that to prove E(o3(n))0 it is enough 


88 


. to prove that z( 


LIMIT DISTRIBUTIONS CONNECTED WITH FRAOTILE GRAPHICAL ANALYSIS 
: : 

Ут 

included in the summation’ Уд. Naturally s is a random variable. We give the fol- 

lowing expressions that can be derived after some algebra. Let us denote 

VAE om0) by 6. 


13 safe 5 С 
х) — 0 аз т э оо. Let 8 be the. number of observations 


1 : ial 
ЖЕР Un— У» v) ) |a, Gaye, а < = <x const x (1--0(1)) 


Ул | 
(6.3.3) 

Е (24 2,0,0 < 8) = 8-8 eo EU ) s. (0.3.4) 

E (а-аа « 0) « o. : ... (6.3.5) 


Expressions (6.3.3) through (6.3.5) suffice to show that a,(n)—>0 as noo. - The case 
of general g is dealt with in a similar way. 
+e : 2 es 

a(n) = ут (ps) tends to zero in probability means that the 
ordinate of the combined fractile graph @2 tends to the mean of the ordinates of the 
fractile graphs @ and 07. Thus we see that the graph G!* tends to lie completely ` 
within the graph @ and G?. 

6.4. Let r be the number of intersections between the fractile graphs Са 
апа 6%, r is a random variable. We know that these graphs intersect between 
i and ¢-+1 if and only if wiu» орла < 0:16. hiaan) 215302 < 0, 1.6; the sign of 
"lag (2) and Таз) are opposite. Let us replace the sequence ОТО) 
by their signs thus (+, —, +: —) (say). “Тһе number of changes of ‘runs’ in this 
sequence, 1.6., the total number of runs in this sequence minus one, is the number of 
intersections between the graphs G* and (2, The probability that Naar) < 9 8} 

are not independent even asymptotically for 


for any i. But Таз”) and Mjaa”) are 1 
any distribution. трів feature mars the beauty of the pieture and the theory of 
runs with independent tosses of a eoin cannot be applied here. 


Anyhow we will calculate H(r). Define the random variables Z; i = 1,... 9—1 


by the relations 
L if phas? 
Z= д 
0 if тт > 0 
Then 7 = 2, from this we find 
5 1 


Woo кы 177 (84) 
1 п 
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We note an interesting fact that E(r) < = if all the 2 are positive. 


2 
2X iw 4] ; 
6.5. The concentration ratio У із 1 И (see (2.06.1) From 


20 
9 1 


Theorem 4,1 we see that the ор are asymptotically distributed as a multivariate 
normal. Hence 4/;(E—0) is asymptotieally distributed with mean zero and 
variance IAI’ where 1 — (I, ... 1) 


7. REMARKS AND SUGGESTIONS 


7.1. We have seen that if two samples (1) and (2) are taken from the same 
population then л = Е%(а(п)) =0. Thus 7 = 0 may form the hypothesis for con- 


sistency of the samples. Since we know that y(n) is asymptotically normally 
distributed, and we have only one set of observations namely 7,,9)(n), the best test, in 
а sense which can be made precise large samples, would be that based on 


Tas) ( 41) з\п) = T$s. 


Starting from Thorem 5.1, the study of the distribution of the error area А 
should be possible and would be interesting. Again if A were estimated in practice 
the change in the asymptotic distribution of Г? has to be studied. 
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SOME LIMIT THEOREMS ІМ REGRESSION THEORY 


By К. В. PARTHASARATHY 
and б 
Р. К. BHATTACHARYA. 
Indian Statistical Institute 


SUMMARY. (X,Y) follows an unknown bivariate distribution with 0<X<1 and the regression 
of Y on X is continuous and а sequence of observations on (X,Y)are made. An estimate of the unknown 
regression function based on these observations and motivated by the method of Fraetile Graphical Analysis 
has been suggested. Its large sample properties, viz., convergence іп probability and almost. sure uniform. 
convergence to the true regression function have been investigated. Large sample tests for a specified 
regression function have also been proposed for the case when the ‘conditional variance function of Y on 
X is known and for the case when it is unknown, 


1. INTRODUCTION 


Let us suppose that X and У are real valued variables having a certain joint 
distribution function such that X takes values in the interval [0, 1] and all conditional 
absolute moments of order up to p > 3 exist when X is fixed аб any point а, If we can 
make a sequence of independent observations on (X, Y) the question naturally arises 
as to how we can construct from these observations an estimate of the unknown 
regression function of Ү on X, possessing certain properties like convergence in 
almost sure uniform convergence etc., to the true regression function. 
в that of constructing at least а large sample test for a 
п Section 2. we shall make use of the technique of 
Fractile Graphical Analysis suggested by Mahalanobis (1958) to estimate the regression 
and analyse its large sample properties and in Section 3, construct a large sample 
test for the regression. . The erucial part of our analysis consists of the utilization of 


certain results concerning the error of approximation by the central limit theorem 


and an upper bound of the tail probabilities in the distribution of sums of independent 


and bounded random variables. These results are given іп the Appendix for reference, 


probability, 
Another important problem i 
specified regression function. In 


9. ESTIMATION OF THE REGRESSION FUNCTION 


Let (zy 71)» (т, Ув), «++ Gm yg) be independent observations on (X, Y), 
xip the r-th order statistic in the set of observed values of X and y = yj i£ My = tj 


n we shall assume that the distribution function of X is continuous 


Throughout this sectio: › › 
and strictly increasing so that the probability of any two observations on X being 
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equal is zero. Then the statistics x; are defined unambiguously for almost all samples. 
In order to simplify notation we write for r = 1, 2, ..., k and s = 1, 2, ...,n, 


` Trinta) = Xp; o А = Ym 
ELY |X = =), МУХ = а] = Е (2), 
E[| Y—o(x)|"|X = 2] = fuz), m = 2, 3,..., р, 
Дам) = er Ene) = Ern Bn (н) = тм 
7. = i yas = i ут, 


x n - 
= Хы, Ф = > gral, 
4-1 Ч 2-1 
= л n б E - 
Ва =D Bonin, в = En. wes (2.1) 
1 gel 


_ Now we define two functions fj, (т) and $n (£) as follows : 
Jn @) = Hi if 0<2 <x, 
=. = Ист сы «tm a 2. Е-1 
= Ш зн < ад 1. ОО (2.2) 
ene (=) = Ф if OS eS xi, 
=, if ain СС By r= akl 


7 aa ө if жат < Л, ... (2.3 
. Lemma: If the random variable X has a strictly increasing continuous 
distribution function F(x), then 


nemen |- ме (На) 


dor eei m positos integer m. 
е We have 


Pen <= т о-в Шр. . vee (24) 


nt tiones 


шын ное (А 2) o the sum Sy of N independent binomial random 
variables each with probability for success Sue to p, we have 


—Nó 


т ЛЕ 


87 I tog (2 aN + |9 
FISy 2 Nota) < - 


=] xm (s гъ 1-4 2l ifp <4, 
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which, t and ы о less than unity, becomes 
exp [=x ( STET 
Р/В» > М(р-+8)] < RN UI) 
е? [58 ( 1-2 | ite <q. 
We have : 


БЫ 


ТЕГ ТЕТЕ (ар) 


which, after using (2.4) and an application d (2.5) for à — = and two binomial sums 


; ee i r 
with probability for successes T and 1- T =i and number of summands 


equal to nk, becomes greater than 4m. xp ie А | 4-98. exp | md ( 1— 4) for 


every fixed integer т. 
Lemma 2: Under the conditions of Lemma 1 if g(a), the regression function 


of Y on X is continuous then for any € > 0 
А 1 
= qe 
P[ mp, 19 @-# > ЕСЕ ү | ^m ЕБІ 
) is а constant which depends оп € and e only. 


for every fixed nteger m, and where cle, e 
Proof: Let б[а, $ = ШШ 3 Jolt) — plt) | 
ag 


Wi, Te 
КО ор ЫЫ enn 
к ан 
then | аһ (в) = dum |е(2)-е(2)| < 9 (2.6) 
Thus 


Pld (е) > 6 < ШЕ «|р ЕЗ Ka, & F^ (25%), алауы i-i] 


th) com cP (CAR) ава 2s (2:7) 


„ЛЕР pet 
Because of (2.6) the first term on the right side of (2.7) becomes less than 
РФ, > gi Since F is continuous and strictly increasing 7-1 is continuous. Since 
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| 6(т) is uniformly continuous we have д, < є for k > бе, +). "Thus the first term 


vanishes for k > ke, е). Тһе second term becomes less than 4m exp [тт ] + 2k 


exp | та ds - 2.)] because of Lemma 1. Combining the two we get the required 


inequality. 
Hereafter, for simplicity, we shall assume that 2, (2) > сү > 0 and £, (x) < с, 


for all m = 2,..., p and all =. However, from the proofs we can see that all our 
results hold as under more general conditions. 

| Lemma 3: If f(x) > с> 0 and р(х) < в; for all т = 1, 2, ..., p and all 
x, then for any в > 0 


r 


Р (eur TRENT, >< б. к 


where С is а constant depending only on e, р, су and cs. 
Proof: We have 


Р(е) = Р [sup || fa ()-ө (2)! > |а, ..., ть] 


= бәр, 1%-е > в ..., ты] 
19-% | 9r— e» | [2277 
S P bn (2,8 
x Увьт ^ di B 


Making use of (A7) and noting that Т, < У" we have 


evn > 03). log Т,, for n ‚> ne). 


Мв 
де» оу ‘Theorem Al we obtain 
E Pe«me( A) бы в ИС 
Cg. 14 ($8 ye pe 
уз a 
where c, = * апа 0, is a positive constant which depends only on p and 
Pe І і 
Ф(ж\ — Еу д 1 „з 
с Jom dt. ея ire е ee 22.0, (2.9) gives, for n>n/(e), 
E S ; 
ГЕ CH? cake 
“Pos eee die 325i 
$ 1242 
E k BERN k 
exe. р, 6, 22 656 
al 9 «| 7. 625% tas 


which completes the proof. ' 
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Now we shall state and prove the main result of this section. 

Theorem 1: If the p-th conditional absolute moment of Y exists when X is 
fixed at any point x, p > 3, X has a strictly increasing continuous distribution function 
in [0, 1], the regression d(x) of Y on X is continuous and Balx) > c, > 0, fl (v) < ca for 
all m —1,2, .., p and for all x, then for n > (44-8) Е log k, 8 > 0, 
dj = Bu | А2) —9(v) | converges to zero in probability as k— оо and for n > (8+8) 
k log k, 0 > 0, dy, converges almost surely to zero as k— co. 

Proof : The first part of the theorem is an immediate consequence of Lemmas 
2 and 3 and the second part follows from the same lemmas and Borel-Cantelli 
lemma. 

3. LARGE SAMPLE TEST OF A SPECIFIED REGRESSION FUNCTION 


In this section we shall consider the problem of testing the null hypothesis Ho 
that the regression function (v) is equal to a specified function y(x). 


.Consider the statistic 


|у,—/| 
Tae Bl yes 
5 Bee ҮГІП 
п 
fe E Mën) 
where jj, and B, are as defined іп (2.1) and p, = *=" xim If we had known the 


probability distribution of T, under the null hypothesis, then for any given level of 
significance 0 < а < 1 we could apply the following test: 

Reject Ну if and only if т, > Tax) 
where Рта, > Tula) | Ho] = 9. 
Theorem 2 enables us to apply such a test at least in the large sample, 

Theorem 2: Jf Ва) > а > 0 and f(x) < с, < co for all. x and 
m= 1, 2,.., p, р > З then we have for n > (klog К) 2-2 

i 1 
Jim Piru «АННЫ = ox [— у,“ 


where Ay = V204 log k—3 log log k). 


Proof: Under Hy, we have 4 
P() = Ріт, [L Akt; +> Zin] 


- [Ф(А„)—Ф(—Л„)-Ее] 

теі vi Barln rel 4 

where because of (А8) and the fact that A, < »/4(p—2) log T,, for k > k(0) we get 
(1-54) е 1 

Gel абв Sepa 

| th | y/n А > | 


k ЕСА La b ds | 
ПР 25 olar tes in | = 
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where % is a constant depending on р, сь сз. Thus 
_ log Р(Ар = k log [9(4,)— —Ф(—А„)]+ Ze, 


i e 
e. : 121 < $ | [1+ БА | | 
є р Since for safficiently large Ё 


kee Е lt: 
қ ФФ) 5! 
| and log (14-2) = z4-vz*,|v| < 1 for |z| <}, 
LX 1 
12.1< об | з е Md : 
Substituting for A, in (3.1) we have 


À k 
Izd < a [| 3 


‚ V og ®. е-#+ =z |. 


Sinóe т > (klog k)!»-? it is easy to see from (3.2) that | 2190 as k— 00. 


completes the proof. 


The statistic 7,, defined above can be used only when f(x) is known. 


(3.3) 


(3.2) 


This 


The 


natural way of modifying this in the case of unknown variance fanction is to replace 
~ fly by Я where y? and ӯ, are as in (2.1). Since this leads to certain complications 


E in evaluating the cA distribution we replace it by 


c a 5% [yj — n) (where т, is as in (2.1)) ап4 тие 


as з d 
buy = APT сұық 


То 1754 that this replacement does not change т, TOMOS: in large samples we prove 


the following lemma. 


" I 8 
lim log k. Its ues 
Dm igt. sp F—1 =o, 
x ыы ose 
EE Proof: We have for any 6 2 0, 


= A | P [sw 


є | Lemma4: Т/Х has a continuous distribution function and Y is a bounded 
random variable, Bo(x)>c,>0 and if n> (log k)®+°, 3>0 then under the null hypothesis Ho 
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Б; ‚Тһеп from Theorem А? we get 


% 
Ур | УЕ Cy = 
is 2 | М-ы T Yr т, vn | КЕЯ 
р 
апа УРУГУ" сау NECI Г быт 
cy | ier (е Vn —?n)| > 1198 x] < 2k exp | (og t ] 


where бү апа b, are constants depending on сл, € and the upper bound of | У |. Thus 
whenever n > (log k^? the series. 


EH | fs 
ааа 


converges. Ап application of Borel-Cantelli lemma completes the proof of the lemma. 


Remark: 18 is possible to show that if n > k (log k)?*^, à > 0 then log k 
JEL 
Bar 


variable. 


converges to zero in probability even when У is not а bounded random 


sup 
т 


The following theorem enables us to use the statistic tą, for testing a specified 
regression function when the conditional variance function is not known. 

Theorem 3: If X is a random variable with a continuous. distribulion in 
[0, 1], Y is а bounded random variable and palt) > су > 0 then for n > (k log ky"? 
for some p > 3 we have 


2 m me 1 -0 
Jim Ра < de | Hol = өз | ee 


where A, = V20+log Ё—3 log log k). 


Proof: Tt is easily seen that 
Р/ти < +V) lHo] < Р < А180] < Plu < MAHU) Hol — (3.3) 


where т„ is as in Theorem 2, Un = sup (м1 ) and V, = іші (МЕ ) ; 


r 


: 2, 2 log k-+ log log k 
Further, Ріта < МА+ Va] Р ie Lots et OB BE Lint Upk e Д 


where Z,, converges to unity in probability and v,, converges to zero inprobability 
because of Lemma 4. This fact together with an application of Theorem 2 leads 


to the result 
lim P[r,, < MUl +V л)] = exp [- = e 1. (3.4) 
E» ; Мт 
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A similar procedure leads to the result 


М = ET 3.5 
Jim. Ptr, < А АНИ] = exp | E e ]. S. (3.5) 


(3.3), (3.4) ad (3.5) complete the proof. 
Finally we shall state a theorem concerning the order of change in the 
level of significance if we apply the test ‘reject H, if and only if т, > A, where 


[(A,)—4(—A,)] = 1—a instead of the test ‘reject H, if and only if ,, < 7,,(x) with 
Р[т < Tl) | Н] = 1—a. 


91k 
| 


Theorem 4: Let A, satisfy DA-DA) = 1—a and a <1-|1-,/ 
Then under the conditions of Theorem 2 we have 


ет 


3 : 
Р AE ]—(1 (os al k 
| Pltm < Ав [Но] -—(1—а) | & c. SR = a+ P=? 


| 
| 
where с is a constant depending on p, c, and Cy 


4. REMARKS 


In this paper we have not given any consideration to the power of the 
Тик and taz tests proposed in Section 3. It would be very interesting if lower bounds 
for the power of these tests could be given in terms of the supremum distance between 
the regression functions under the null hypothesis and the alternative hypothesis. 
Though this has not been done, we can at least easily verify that if the regression 
functions under null hypothesis and the alternative hypothesis are both continuous, 
the т, and t, tests are consistent, i.e., for any given level of significance 0 < x < 1, 
the probability of rejecting Н, tends to unity as > оо. 


For el validity of the above theorems we have imposed certain conditions 
on иц. желек function ¢ and the variance function f separately. In some cases 
(e.g. Binomial on Possion distribution) (3, is a function of e. This however does not 


affect our analysis іп any way so long as the conditions on e and р, remain valid 
separately. : 


Е[#у(&)]@Ф-®/> are finite. 


* (x) is the usual normal distribution function. ^ y 75 
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Appendix 


Tn this section we shall state and sketch the proof of some of the results which 
were utilised in our paper. 


Let:X, Xy, X, л. be a sequence of independent random variables with 
EX, = 0 and EX? = о? and р,» the v-th absolute moment of X,. We shall suppose 
that the moments of order Ё > 3 exist. We write 


cel В, п 

By = a sc Б) Pm = ру Tm = 5% ^. (AI) 

Then € Pin < pil, v= 2,8, ..., Е. о) 
Let f,(!) and B be the characteristic function and distribution funetion 

of E We now state two lemmas one of which is due to Cramér (1937) 
m. n 


and the other due to Esseen and Berry. 


Lemma Al: (Cramér) For |t] < < ТЗ we have 


| k-3 Te Zu 
м 


where P, (it) =A On (EY, 


€ jy = ср ры" and c; and c, are constants depending only on k, су being positwe. 


Lemma А? : (Berry-Esseen). For |t| <Tsn 


| —t]» 4 БЕА 
fle m |t? e D 
А е PX 
i gt) e ^^ | E aa) | ... (АЗ) 


Hereafter we shall denote by 0; any positive constant which depends only оп 0. Then 


by using (А1), (A2) and Lemma Al we have 


27 
tt Pali) ГЕР ПЕ айы. ирен 
v n2 әзі je vel j 
E] i 
СТЕ ppa сб, X А4 
ыл АНА тр an 
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. From (АЗ) and (A4) it is seen that the inequality of Lemma A2 can be rewritten as 


es > [217+ ете! 


«a0 040) excl. t |3. ете!» ҒАР EU S 


+T: 
Lemma АЗ: f: 


^ Ј, (0) — Ilt) 
t 


6, 
TANA 
Case 1: Let Tin < 1, then Т,, < Туз. Hence applying Lemma АТ we get 


ҮТТЕ 
f” 440-040 “ < pli 


—T in 

Case 2: Теб T,, > 1. From (43) it is easy to seo that 1 < Ты < Tan : 
J Т) 
Thus P 
—Ты 


ГАСА 940 а PLE 
bs тыз тра 


= 21,-- I, say. 


$, In thes region of integration of І, Lemma А? is applicable and for Г, Lemma АТ is 
applicable, _ Thus from (A5) we have 


> dy Jit— m С < mrt Eget, < O/T. 
Let | Q6) > 0,0 < qt) < 1, Ге = 1. 


40 = i ё Qa, q(t) < та» 


te 
JJ, (t| 3534404 < co. 2. (А6) 


ov М. = ое then for |x| > 1 


йл en S E F 


+ аы = J) 
ст TE Тез 


m 
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Proof: Case 1: Let Ty, < 1, then Tyn < 7.4/3. Applying Lemma: А1. 


P(T4) < T trae pe mee S 2 Еа 
б Tess ka 3k-71,—12/2 fe 1 
5 ты ret да < cT ТЕР 


Case 2: Let T,,- 1. Then as before 1 < Т < ined 


T. тр 
P(T,)—2 f + f 
тр T 


240-040) шуй = 21,4, sa. 


As before by applying Lemma А1 for 7, and Lemma А? for J, and proceeding 
as in Case 1 we obtain 


(2 1 
PE np) « р. 

(Tg) < hla Te 
Utilising Lemmas АЗ and А4 and proceeding along the same lines. as Esseen (1944) 
we can prove the following results. 


Theorem Al: Let Ху, Xo, ...,Х, be a sequence of random variables with 
mean zero and finite absolute moments of order k(k > 3). Then for «> А» 


Р„(—Ф) 


(2) — а) 3 E and for 2<), 


0, 1 
Sepp nes 


Ра) Ер SPA) cg [р а 


nile 


where А, = 140 if T, < 1 and max [14-9, 1/215-9)0-2)! log Tn] if Ты > 1. 


а, 

б, and б, ате constants. which depends only оп k and д, Pyn (-Ф) we (E. 
- .. 

Chins Фоз%(>) and сут are 08 in Lemma А 1. 


Remark: From the above theorem it can be easily deduced that 


рш—Ф@)|<-үу ев 2А n (AD) 
F,(2)—9() | € % КЕ + A fora <А. (A8 
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Let %,,%,...,% be п independently distributed random variables with 
BE) =0, |&| <o $— 1,2,...m, BE, +... FE? =o? and & = %+...+%, Then 
we have the following theorem due to Prohoroy (1959). 


Theorem А2: Under the conditions stated above, forx > 0 we have 


PEE > x] < exp [-2 sinh 4%), 
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THE WEIGHTED MEAN OF TWO NORMAL SAMPLES 
WITH UNKNOWN VARIANCE RATIO 


Sm RONALD A. FISHER, F.R.S. 
Division of Mathematical Statistics, OSIRO, Adelaide 


SUMMARY. The exact small-sample solution of the problem of the weighted mean, flows 
from an analysis parallel with that required for Behrens’ problem. Tho algebraic forms are, however, more ' 
complex, involving one more paramoter, for which it is convenient to choose Sukhatme's 4. The logical 
basis ОҒ choice between weighted and unweighted means is considered. 


1. ANALYTICAL PROCEDURE 
Since the analytic demonstration of Behrens’ test of significance for a difference 


between the means of two Normal samples has been set out explicitly (Fisher, 1935), 


i& has been apparent that the problem of the distribution ofthe weighted mean of 


two such samples could be resolved, in the exact terms appropriate to small samples, 
mathematical generality, however, this problem is a more 
f significance of the difference, for in addition to 
for the two samples, the modular 


by a similar analysis. Та its 
difficult one than the simple test 0 
involving the two degrees of freedom, 7, and ng, 
angle 


tan 0 = 81/8; 


ations, and the level of significance «required, it 


of the ratio of the two standard devi с f 94 
һе measure of discrepancy of Behrengy test as 
Ж "hm 


‚ involyed also as a fifth parameter, t 
defined by Sukhatme; namely “ 
d = Ert Vit 


Й 


and this to no unimportant ап extent. 


: ‘In a recent paper in Sankhya (Fisher, 1961) I have shown how this test, like 
that of Behrens, can be exhibited as а verifiable assertion about frequencies, as must 


indeed always be the case "when the Reference Set adverted to in any statements of 
"probability has been explicitly specified. 
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6x йө Behrens’ distribution is symmetrical, and во сап be conveniently 
` tabulated by giving the deviate, positive or negative, determining limits outside 
‘which d shall fall with a specified frequency, in random samples from the reference 
set, as is the case with Student's 4, the distribution of the weighted mean is 
unsymmetrical in the general case. Тһе analytical method of asymptotic expansion, 
- which will be used in this paper, is however, just as applicable as it is in the case of 
'' Behrens’ test;(Fisher, 1941). 


2. SPECIFICATION OF THE ANALYSIS 


The ordinate of a Student's random variable t, for n degrees of freedom, may 
be ume: in Student polynomials, P, in the form 


Ul d$ pat 
еріт = Eee қ wee (1) 
іп which the first six polynomials are 
Poa Е 
Р, = 022—1 р 4 
| P, = 38—28 30044 120243 БАРТ 
Р» = 02—22004-1138—9218—33—6/2-15 | = 384. 4. (э) 


P, = 1506--600/4--7 100/12— 26616¢1°-+- 183308 - 6360/9 -- 19808 
— 1800 — 945 + 92160 


Р, = 320— 19018 4-4025016__ 3397044-103702012__6344400__ 2127018 : 
— 180019 1- 45514-1- 1890(2—17955. 2-368640 J 


The simultaneous distribution of two Student variates is exhibited by the 
product of two expressions of the form (1). , 


; The distribution of the weighted- mean is determined from the Statistics 
_ Observable im the two samples: 


t3 ipod c 
2 ar Mem ese, Ж 
3 ke" (8 


А 
eas ЖЕ E) mim 
. by putting | T= (99, 4-52 @,)/(92-+-88 ) 
and 05) ТАНА n . (4) 
КО, 5 аа 
and obtaining finally Ip = 218$. $ $t; ng TA (5) 
viene оо 


“whence 


‘divisor is 


| IGHTED MEAN OF i SAMPLES WITH UNKNOWN VARIANCE RATIO 
To obtain explicit expressions for ш, for any given probability we веб 
= hs i 


й—®, = 1%, s,[8, = tan 0 


А 8183 


WE ера (вһ-Еву) = S(t сов b-t; sin 0). 22/8) 


Let 
1, eos 04-1 sin 0 = E 
(7) 


1, sin 6-і; сов 0 = d 


where dis Sukhatme’s criterion for the significance of the observed difference between 
2, and Z,. Since d is known, the distribution of д is required conditional upon a given 


value of 4. 
3. INTEGRATION 


We should then substitute 
1, = и cos 04-4 віп0 | 


tp = usin 0—d сов0 j 


in Student's polynomials. ‘Then 
oo о % 
Г фи- f dw; 
i -o 
will give the probability of the deviation of the weighted mean (standardized by divid- 
ing by S) exceeding any chosen value č, the integrand being the product of two ex- 
pressions in the form (1) for 74 and n, degrees of freedom respectively. 
From the way the variates have been transformed from t, & to u, d the external 


factor 

% : e- (+8) КЕРІС) 
becomes a eho et s», (10) 
th to numerator and denominator, and may, 


of which the latter factor is common bo 
ding constant divisor 1/27. Тһе integral of. 


therefore, be omitted with the correspon 


Su е-е Рди cos 04-4 sin 0)P,(w sin 0—d cos 0) e (11) 


ponential multiplied by а polynomial in ш, when taken from --00 


being that of the ex 
d and 0 which may be designated by Dj, во that the complete 


to со is a function of 
үй © © 

Ех È Dam” n": (12) 
t=0 8-0 
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We may at once note, writing c and s for the cosine and sine of 0, 
2-1 
Ри = cte? 1) st (dt — 902—1)--4 | <a 
Dy = {c'(d4—2d?—1)-+-40%s*(d?—1)} —-4 
“Ро, in tabular form 
го d d ш 


cs? --192 192 


Perec eB: i) ae fe (13.2) 
os 848-497 —45 +7 .) 
ё 3412 +30 —28 +3 

while Dy, expressed similarly is 

1 ad ds е dq* 

ал 
Өй — 8) —252 122 —90 1 16 a) 
ОТЕ 1 9L— 01 +) 


Expressions for Ду, Do1, Р, Роз are easily found, but will only be needed if adjust- 
ment beyond the third are to be calculated. 


The incomplete integral in the numerator involves terms of the form, 


i ree и” du, "ER 
which when m is odd is simply қ 
(E + (m—1) EMS 4E... + (m—1)(m—3)...2}, ға» 
but when’ m is even has in addition 
| а(т-1)т--3) ...3}, ey (18) 
in which expressions 2 stands for 
and q for (17) 
Г zdu 
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4. DIVISION 


We may, therefore, express the incomplete integral of the numerator as 


2% E Ipni" 15-9 X X Р, тут", 
T > 


and if dividing by Di PRA pees Dati. 
74 Ng 
there remains the quotient | g-+z X E qum 15°, 
726 


then the first operation (division) takes the form 
fio = Tio dou = 14 
920 Бл Dio 
di = 11—40 Dy —49Di 
dao = Lo — 0 Рю 910 Dy; 
dai = Inu Ри-4ю Ри—9 а-а Da 


and so on, 
the same weight (ғ, s) of D and q. 


5. ADJUSTMENT OF DEVIATION 
The expression 


2Q-—XXqg,n'n' 
T 8 


gives the excess probabi 


Tf the probability evaluated is exactly 
4а) = 45-7) 


where F is some function of č, then 


E=2+F 
and z Q = q—F)—4(5) 
or z [r43 а-а, @—% ууз...) 
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(18) 


(19) 


(20) 


(21) 


adjusting the direct integral I at each stage by subtracting all products of 


(22) 


(23) 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A 


using the expansion in Hermite polynomials. Hence F may be expressed in terms of 
Eas 


F = Q— Sige 5 (224 0 (6-7) + (24) 
from which may be derived 
fo = do for = dw 
Ло = 9—5 
а = а-а ... (25) 
Јо = dso— 9090-Е $ (222--1)9% 
far = 9—5 (4 (11-01 Фо) - 3 (253--1)4% Yor 


and a+ DD fp, mz? nz? wee > (26) 


is the deviate cutting off exactly the assigned probability 


but in which the polynomials in F are expressed in terms of £ . 


Finally to express the successive adjustments in terms wholly of the normal 
deviate, known in advance, denoted by x, we put 2 for Ё and calculate 


1 1 gd 


d 
б=т omit Pom... ve QN) 
with, in detail Uo = Лю Чо = for : 
1 
Ua = foo 5 $ f ; ... (28) 


у th = fa + (foo fu, 3 


and so on, 


Ы ое first adjustments calculated in this way are set out in the accompanying 
Ns ‘able, i К 
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TABLE OF THE FIRST THREE ADJUSTMENTS 


I 


The adjustments are successively of the 4-th, 8-th and 12-th degrees in сов 0 
and sin 0, and of the 3-rd, 5-th and 7-th degrees in z and d 


лота = (28-0) -44(02--1) в - 2(3d2? — 1)uc38?-- 4d(d? — 1)c89) + 4n, 

1 = (—4d(d?—1)c3s-]- 2(3d?—1)ac?s?—4d(x?-+ 1)с83-| (33--))81) + 4, 
where, as always, %,, may be obtained from u,,, by interchanging c and s and reversing 
the sign of d, or by writing c for s, (—s) for c as if (п/2--0) were written for 0. 

П 


For higher adjustments it is convenient to separate the odd and even powers 
of d. The terms to be divided by 96n? are 


even odd. 

25 23 2 gi w? 1 

св +1 5 16 3 
cig + 48d 1 3 2 

6682 - 4 4 23 51 
+ 12d? ` 15 31. 6583  — 964 1 5 8 
+ 32d3 қ 11 13 

2484 + 12 à 2 9 
— 24d? 2 10 39 с355 + 964 е 1 3 
+372d4 А ғ 1 — 6443 . 5 13 
+19245 . , 1 

с286 414442 М i 1 
—240d* . " 1 сїт + 9643 . а 1 
— 9645 D ? 1 


with conjugate terms divided by 96, while with divisor 16717, they are 


even odd 
25 аз “ qi ал "n 
Quot pag 70-24 69 
— 40d2 А 2 72 (с588--с385) — 84 3 13 22 
+108d4 i єл + 1643 621519 
d — 4845 1 
(с6824-с28) — 2 . 5 9 
+ 6d? . 5 17 (с1в—ся1) + 164 4 1 1 
— 4844 ` . 1 — 1643 Я 1 1 
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ІП 


The third adjustment has four parts іп two complementary pairs for из and 
Ug, respectively divided by 384nj 


с1052 


ЕСЫ 


C036 


6488 


62510 


а Е ое ш Шы ы сс MDC CUR 1 ВОН Ор SEES ОВО 


— 8 

+ 2442 
— 404% 
+448846 


— 9642 
+ 96d4 
— 662446 


— 96044 
+134446 


24 


134 649 1695 
196 1146 2486 
965 2341 

8 70 225 
56 480 1441 
. 250 839 

1 

10 51 

35 181 


eus 


с983 


6785 


cht 


6389 


сз 


оаа 
ze Ғы ал 1 
+ 264 4 23 39 24 
— 64d 8 55 165 198 
+ 192% 9 33 28 
+ 3844 1 10 40 71 
— 204843 3 14 19 
+ 76845 7 9 
- 3844 1 7 15 
+ 198% 21 146 286 
-1036845 1 2 
+192041 1 
— 25643 5 16 
+ 38445 м 23 
—2304d* 1 
- 38445 .. 1 

ýs 
+ 88447 1 


Normal deviations for chosen probabilities іп a single tail are given below for ready reference. 
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z 
.5 .01 2.82635 
.25 ‚61449 .005 2.57583 
.10 1.28155 .0025 2.80703 
.05 1.64485 .001 3.09023 
"025 1.95996 0005 3.29053 


WEIGHTED MEAN OF TWO SAMPLES WITH UNKNOWN VARIANCE RATIO 


and by 384n?n. 
Reni oc OU NOE UY SUD ДЫ a RE NR 


even odd 
gr 25 аз 2 . 26 p а? ДҮ, 
ciis + 192d ў 205,58 2 
ў Ligas 1 3 2 
vacat EI : 77 398 435 
+ 6а . 77 568 931 
— 964+ 4 б 15 31 c93. — 164 28 251 819 1080 
+ 38403 с 10 49 57 
- 38445 гі E 11 13 
с884 Tod 153 1177 5099 11355 
и — 144d? Я 28 188 427 
» + 244+ А в 515 1679 ci + 324 ‚ 52 377 1341 2148 
— 595246 Б . x 1 — 512d? 5 797 136 209 
+ 38445 . A 53 101 
—3840d7 : B à 1 
68686 = 4 36 361 1580 4065 
+ 180d? ' . 35 294 607 
+ — 8% * . 3040 11099 
| --1788046 . . " 1 «517 — 1924 4 39 145 246 
” ў PTY оз 487 818 
— 38445 . А 61 + 137 
+768047 . . б 1 
| с148 + 12 . 14 103 225 
— 1204 . 14 121 319 
> + x 4d* n % v 3085 11813 
-1180846 . . 1 сә + 1924 3 17 24 
— 19843 ы 15 103 161 
- 691245 К 2 1 2 
i —2304d7 я x à 1 
4»! chao + 144d? . АС 5 13 
- 484% E б 25 93 
+ 172846 1 * ` 1 сәл + 88443 B ; 1 i 
№ — 38445 > E 1 ! 
Su Ир ИИ “тардың 
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6. DISCUSSION 


Tt appears unmistakably that the quantity d, defined by Sukhatme for imple- 
menting Behrens’ test of the significance of the difference between the two observed 
means, is a major factor in determining the precision with which the weighted mean, 
with appropriate adjustments, can be used for the estimation of the common mean, 
assumed by hypothesis, of the two populations sampled. The analysis implies that 
the result of applying Behrens’ test is not such that the hypothesis of a common mean 
is abandoned; it does not imply that the difference is not formally significant at some 
of the standard levels. 


The importance of d іп the formulae here developed is that the experimenter 
is led to realize that, if the populations have a common mean, but at the same time 
there has by chance occurred a somewhat wide discrepancy between the means observed, 
then the precision with which this common mean can be estimated from the two 
samples available is much lower than it would have been had the two means come 
in close agreement. This is really to be expected for two reasons: 


(a) If the two means are close together, the weights attached to each in 
making a common estimate make little difference to the estimate obtained, which will 
be in this case principally liable only to nearly equal errors in the same direction of the 
means of two independent samples. Whereas with a large discrepancy the value of 
the common estimate will be much affected by the relative importance attached to 
the two pieces of evidence, and this, with small samples, is not well determined, 


(b) A large discrepancy between the two means, in relation to the variation 
within samples, is in itself evidence that the mean Squares observed within samples 
are both too low, and that the precision will be overestimated if the discrepancy is 
ignored. This information is absent on the hypothesis that the true means have 
an unknown difference. 


А numerical example will perhaps make the position clearer. If we wish 
to locate the point which the true common mean has a probability of one in forty 


of exceeding, we should take, on large sample theory the-value 


2--(1.95996)8. 


The coefficient of S is modified by the first three corrective terms as follows 
in the case л, = 15, n, = 20, cos? 0 = 2/3, and arange of values of d, as shown below. 


The successive values for the adjusted coefficient seem to showsa satisfactory 
convergence for the values of (п, тз) chosen. The coefficients after three adjustments 
seem usually to be correct to two places of decimals and in the central region to three. 
As in other cases, it is to be expected that convergence will be slower at higher levels 
of significance, but more rapid for larger samples, with which, however, the influence 
of d may still be very considerable. 
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4 first A res second. MUS third nd d 
adjustment coefficient adjustment tension PARERE coefficient 
3.0 .80853 2.76849 ‚15687 2.92536 -00226 2.92762 
2.5 ‚62419 2.58415 -08344 2.66759 --.00968 2.65791 
2.0 .45831 2.41827 .03568 2.45395 —.01012 2.44383 
1.5 .31483 2.27479  , .00861 2.28340 —.00623 2.97717 
1.0 .19768 2.15764 — .00280 2.15484 —.00234 2.15250 
0.5 .11078 2.07074 — .00435 2.06639 — .00028 2.06611 ' 
0.0 .05806 2.01802 --.00166 - 2.01636 .00006 2.01642 
-0.5 ‚04345 2.00341 + .00013 2.00354 —.00018 2.00336 
-1.0 .07089 2.03085 —.00323 2.02762 .00013 2.02775 
-1.5 ‚14478 2.10424 --.01460 2.08964 ‚00095 2.09059 
-2.0 .26758 2,22754 — .03506 2.19248 — .00007 2.19241 
-2.5 .44470 2.40466 --.06345 2.34121 — .00880. 2.33241 
-3.0 .67957 2.63953 — .09618 2.54335 —.03556 2.50779 


An experimenter who can obtain values of a physical constant by two different 
methods, the sources of error in which are unrelated, may wish to review his data 
from two distinct points of view : 


(a) That he has full assurance that his theoretical formulation is correct, 
and that his experimental procedures are both free from systematie error. In this 
case any apparent diserepaney between the two mean values is ascribed confidently 
to random sampling errors only, and fiducial limits for the common mean at any chosen 
level of probability may be obtained by the formulae of the present paper. 


(b) That he does not exclude the possibility that his two methods would lead, 
if indefinitely repeated, to two different average values; that he will admit this possi- 
bility without knowing to What such a discrepancy might be due, or being able to 
compensate or allow for it. In this case, a large value for Sukatme’s d is not evidence 
of lower precision. Indeed, as the discrepancy is not known to be wholly due to errors 
of random sampling the weighted mean has no special merit. 


A statistical formulation appropriate to this case is that if и, and / are the 
two population means of the two methods, there is one relevant quantity of which 
estimation is possible, namely 

(J33-13)/2, 
whieh presents a simpler problem than the weighted mean, for its error curve is 
symmetrical, and is indeed supplied by Behrens' test of significance. "The error of the 
estimate, 
(noB 
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being (481+ t282)/2, 


which, since the ¢ distribution is symmetrical, is just half the variate tabulated for 
Behrens’ test, when multiplied by 8-8. 
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A NOTE ON MUTUALLY ORTHOGONAL LATIN SQUARES 


By S. S. SHRIKHANDE 
3 Banaras Hindu University, India 


SUMMARY. It is proved that the existence of a set of (n—3) Mutually Orthogonal Latin 
Squares (MOLS) of order п implies the existence of a complete set of (n—1) such squares and hence the 
existence of a finite projective plane PG(2, n). 


1. INTRODUCTION 


A Latin Square of order т, is an n x matrix whose entries are from a set of 
n distinct symbols such that each symbol occurs exactly once in each row апа column, 
Two Latin Squares A = (a,;), В = (bj;) of order n are mutually orthogonal if the n? 
ordered pairs (2, 6) are all distinct. А set Aj, Ag, ..., An of Latin Squares of order 
п, is called orthogonal if А; and A; are orthogonal for all 4 5 j. It is easy to show 
that m<n—1. An orthogonal set is said to be complete provided if m = 2—1. 


Bose (1938) and Levi (1942) independently showed the equivalence of complete 
sets of such squares of order n to the finite projective planes PG(2, n). 


A Balanced Incomplete Block design (BIBD) is an arrangement of v elements 
into b sets of Ё (<v) distinct elements such that every pair of difierent elements occurs 
in exactly A sets. Then it is easy to show that every element occurs in exactly 7 of 
these sets. The numbers v, 0,7, k, A are called the parameters of the design. It is 
knowh that the existence of a complete set of squares of order n is equivalent to the 


existence of a BIBD with parameters 


sy — т, b= nin», r =n+l,k =n, А=1. Е 


Bose and Nair (1939) generalised the notion of a BIBD to a Partially Balanced 
Incomplete Block design (PBIBD). These new designs with 2-associate classes have 
been classified by Bose and Shimamoto (1952) according to their association schemes. 
Let v = n? treatments be exhibited in a square array of order n. If а set of 1—2 
MOLS of order n exists, then the association scheme of a PBIBD for these n? 
treatments is said to be of type Г, if any two treatments are 2-associates, if and 
only if they occur in the same row, or column or correspond to the same symbol 
of any one of the Latin Squares. Then from Bose and Shimamoto (1952) we have 


ту -(а-1)а-і--1), » = (п-1) A23 
pla = (n—tP+i—2 ; p = (1—)(%—%--1) 
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2. MAIN RESULT 


Theorem : Amy set of (n—3) MOLS of order n can be uniquely extended. to 
а complete set of n —1 such squares, if n - 4. 

Proof: Suppose (n—3) MOLS of order » exist. Then we can get à PBIB 
with association scheme L,_, by forming blocks of size n, corresponding to the rows, 


columns or the same symbols of each of the (п—3) Latin Squares. The parameters 
of this design are given by 


б=т, В п, r—mn—-1, k=n, A=0, 4-21, .. (2 1) 
п: = 200—1), n,— (0—1); phi = n—2, ph = 2. 


From Shrikhande (1959), it follows that the association scheme of the above design 
is of type Ly, i.e. the n? treatments can be uniquely arranged into a square array of 
order n, such that any two treatments in the same row or same column are 1-associates, 
otherwise they are 2-associates Forming 2n blocks corresponding to the rows and 
columns of this array we obviously get a BIBD with parameters 


0 = п?, В = пп, r= n--l,b-n, А-1 


which in turn implies the existence of a complete set of Squares of order ». Tt is 
obvious that if we apply the same correspondence which was used to obtain the blocks 
of (2.1), then the added n blocks corresponding to rows (columns) give a Latin Square, 


such that these Latin Squares are mutually orthogonal and also orthogonal to the 
previous set of (%—3) such squares. 


Using Bruck and Ryser’s Theorem (1949), we have. 


Corollary 1: Тет = 1 or2 (mod 4) and the square-free part of » contains a 
prime 44-3, then there do not exist (n —3) MOLS of order n. 


Similarly in the notation of Silverman (1960), we have the following 
refinement of his Theorem 4.8. 


Corollary 2: If m — 1 ог 2 (mod 4) and the square free part of n contains 
а prime 4/--3, then 42 (n) is not r-orthogonal for k > n+r—3 
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А STUDY OF BIB DESIGNS WITH REPLICATIONS 11 TO 15 
By C. RADHAKRISHNA RAO 
Indian Statistical Institute 
SUMMARY, The combinations of parameters for BIB designs with replications 11 to 15 have 


ted and the actual solutions of the cyclic type have been given іп a number of cases. It may be 
observed that Fisher and Yates (1953) give the solutions to BIB designs up to 10 replications only. 


1. INTRODUCTION 

Sir Ronald Fisher, during his visit to the Indian Statistical Institute in the 
winter of 1960-61, suggested to the author that a study may be undertaken of the 
BIB designs with replications 11 to 15. Не also gave a list of possible combinations 
of the parameters о, Б, k, r and A for such designs, except those derivable by doubling 
a known design with half the number of replications. This list is given in Table 1 
together with the reference numbers, which are assigned following the convention 
of Fisher and Yates (1953). The designs derivable from geometrical configurations, 
denoted by o.s. (orthogonal squares) and o.c. (orthogonal cubes) in Fisher and 
Yates (1953) and Fisher (1945), are given a more explicit representation, indicating the 
particular geometrical configurations from which the solutions are obtained. Тһе 
last non-geometrieal design listed in Fisher and Yates (1953) has the reference no. 31 
and, therefore, the non-geometrical designs in the present list of Table 1 are numbered 
consecutively from 32, first with increasing values of » within a replication (r), and 
then with increasing values of r. The actual numbers are written only in cases 
where the solutions have been found. 


It may be seen Вата number of cases denoted by (—), the solution is unknown. 
- In some cases solutions are derived by the methods given Љу. Bose (1939) and Rao 
(1945, 1946) in so far as they are applicable. Other solutions are obtained by trial 
and error. Theimpossibility of certain designs has been established by using the results 
of Bruck and Ryser (1949), Schiitzenberger (1949), Chowla and Ryser (1950), 
Shrikhande (1950), and Hall and Connor (1954). Since their results and the author's 
(Rao, 1944, 1945, 1946) on the cyclic representations of geometrical designs are 
published mostly in Journals on Mathematics the relevant theorems are quoted in 
the next section for ready reference. 
Some of the solutions obtained by trial and error and listed in Table 3 can 
be derived as particular cases of general combinatorial problems. For instance, a 
combinatorial assignment problem of which no. 66 in Table 1 is a special case may be 
stated as follows. Tn an establishment, there are v officers, r departments and 0 types 
of jobs in each department. Every officer has to be assigned a job in each depart- 
ment such that, in any department there are equal numbers of officers in different 
jobs and that any two officers have common jobs in exactly A departments. When 
v = stand s is a prime power a solution exists with A = 1. Fors = 6, no solution 
is possible with A = 1, but a solution can be found with A = 2 (derivable from the 
BIBD design no. 66). It is of some interest to determine for any given number v 
the minimum A for which the assignment problem is soluble. All the numbers can 
then be characterised by the associated minimum A. 
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TABLE 1. COMBINATIONS OF THE PARAMETERS FOR BIB DESIGNS WITH 
REPLICATIONS 11 TO 15 


юр в А Ёш. 
Аа ЕТТЕ ВВ ТЕПТІ ЕТІ 
ДАЛЫ ON des oc 1n 18:3 88 ТО 1:68 
а 18 E o AATE аш а а) 
29:98:05 31:17 10/4 054 095 ОРИ 1-66 
В Тр 516,86 36 м в м 2 66(4 
МЕТ ТҮС S OP im 446 7 м 2 674) 
Cer Seca HS IDA NA m 68 9 12 м 23  "(Th2) 
109) ALO ido CIS БАН: $57 war 7 зл = 
nd e ЫТЫ АЛЫ te о 92 4 4 2 Thi) 
Я РЕС 741 9 182 13 м 1 води 
ТӨЗЕЛ ЙГЕ СЕ LE 249 183 183 14 м 1 Р(2,13)1 
Ni c6 238. 9748 1D Fees SMe s dg 
2 38 8 12 4 — {990 9995 . Еу ИЄ 
36 100 3 12 1 45 16135580: Bek L. 578 
$3244: 0.9/0 119,5. g^ em 18/548 IB чм 
a s 12 а 4 (ТЪЛ) 182/40: 328 РГ ДБ 
FEES ERNST ALD AAS 16. 30. в шт  E(439)5 
ade 198 ы 101g. 5-2 ОЕ отеу вв 
6560 12 2 *(Th2) 30/4 COB Ив 7185. 92717 
Шы Кое 19 d ee aa в mm 
81045007 12. „719 8 %Тһ4) 81 155 3 15 1 P (4, 2):1 
іі 13 п 1 1 вол ài 93 5 15. 2 7 
в 19 в в 1 Pid 31 31 15 15 7 Р(4,92 
27 17 13 1 E($34 30,1392 Ton oes А а 
27299 9 13 4  E(,3)2 MAS CHR ЖЫР ЕЕЗ 
SRG AT тз 59 1817-16... O3, dd “6G Ве: 
40 10 ^ 4 18 1 Разд а 
аав В Е Sn UBI 18 OREL bee 
40 40 13 13 4 P(3,3):2 ав ee Betas 
з з 1213 3  "(Th3) CERE в Be ао сш 
а ва в. | ТН ТО Тв 
ооа | 91 105. 13 15 2  *(Th2) 
MOTO de ds vea us | 100 6 15 15 2  *(Th1) 
V AE ag odo 7g 130204, оао 
DBT xo LOTS AB 501890152 | 196 210 14 15° 1  *(Th3) 
15 42 5 мМ 4 6:44 211 21] 15 15 1  *(Th6) 


—solution unknown, * solution does not exist, ? presumably non-existing, (d) double of insoluble type. 


P and E have been used for PG, and EG, the Projecti i ite geom: nfigur. 
ў 5 С jective and Euclidean finite i 
tions whose construction is considered in Section 3. paren * 
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A STUDY OF BIB DESIGNS WITH REPLICATIONS 11 TO 15 


2. THEOREMS ON NON-EXISTENCE OF CERTAIN BIB DESIGNS 


Theorem 1: (Schiitzenberger, 1949; Chowla and Ryser, 1950; Shrikhande, 
1950): For the existence of a symmetrical BIBD with parameters, v, b, k, v, A, а neces- 
sary condition is that (r—A) is a perfect square, when v is even. 


Theorem 2 : (Hall and Connor, 1954; Shrikhande, 1960): The existence of 
the BIBD, v—r, 5-1, тА, кА, implies that of the BIBD, v = b, k =r, A, when 
A= lor 2. 


For example, the designs 68 and 70 with the parameters 


в = 78, b = 91, k = 12, r= l4, `= 2 


v = 92, b = 92, k = 14, r 14,A=2 


do not exist, By Theorem 1, the latter does not exist as (k—A) = 12 is not a perfect: 
square, and Theorem 2 excludes the possibility of the former. Similarly the designs 
88 and 89 do not exist. No. 47 does not exist by Theorem 1, but this does not neces- 


sarily imply the non-existence of 44. 


Theorem 3: (Chowla and Ryser, 1950): If 9-1 mod 4 and there exists an 
odd prime p such that p divides the square free part of k—A, and, moreover, if the Legendre 
symbol (|р) =—1, then the symmetric BIBD does not exist. 


Theorem 4 : (Chowla and Ryser, 1950) : If v = 3 той 4 and if there. exists 
an odd prime p such that p divides the square free part of k—A, and moreover, if 
(—A|p) = —1, then the symmetric BIBD has no solution. 


For the design 52, (v= 67 = 0, r= 12—k, A=2), v=3 mod 4, 
k—A = 10 and p = 5 is an odd prime dividing the square free part of 10, We find 
д = —2 is a quadratic non-residue of 5, so that the Legendre symbol (—2 |5) =—1. 
Hence, by Theorem 4, the symmetrical design 52 does not exist. By Theorem 2, 
50 does not exist. By an application of Theorem 3, it can be shown in a Similar 
way that 55 does not exist. This does not imply the non-existence of 54, as Theorem 2 


is not applicable for A=3. 


Theorem 5: (Bruck and Ryser 1949): If n= 1 or 2 mod 4 and p, а prime 


dividing the square free part of n, is of the form 4k4-3, then (n— 1) mutually orthogonal 


squares do mot exist. 


Designs 91 and 92 are not possible for they imply the existence of 13 mutually 


orthogonal squares of order 14, which is impossible by Theorem 5. 
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3. CYCLIC DESIGNS DERIVABLE FROM GEOMETRICAL CONFIGURATIONS 


A finite projective geometry of ¢ dimensions with coordinates as elements 
of a Galois Field GF(s) is represented by PG(t,s) and the corresponding Euclidean 
geometry by EG(t, s). Bose (1959) observed that by choosing the points as varieties 
and all d dimensional flats as blocks from either PG(t, s) ог ЕСІ, s) one can generate 
а BIBD. Designs so obtained are represented by PG(t, s): 4 or EG(t, в): d under the 
reference number in Table 1. Тһе parameters of such designs аге: 


PG(t, s) : d EG(t, в) : d 
uU gy) у; st 
b Ф( s, d) 84-4 001—1, s, d—1) 
k (s^ — 1)/(s— 1) st 
г. Ф(—1, s, 4—1) Q(L—1, s, d—1) 
А $(t—2, s, 4—2) ф(@—2, s, 4—2) 
(8451--1)..(а/-4ы.-1) 


> быз amps 

Such geometries can be constructed for s equal to a prime or a prime power and, 
therefore, the designs in Table 1 with reference numbers, of the type PG(t, 8) : d 
and ЕСКЬ в): 4 exist. 


We shall demonstrate how cyclic solutions сап be obtained in a simple manner 
for all designs derivable from finite geometrical configurations. The main results 
are contained in Rao (1944, 1945, 1946) where the theorems of Singer (1938) and 
Bose (1942) have been generalised and applied to the construction of cyclic solu- 
tions to BIB designs. The theorems quoted in this section are from Rao (1946), which 
is devoted to the construction of cyclic solutions to BIB designs derivable from 
eyclic representations of finite geometrical configurations. 


Theorem 6: Let a'1—a,x'—...—ay be a minimum function generating the 
elements of GF(s'**) and the sequence ča, [d = 0, 1,..., (s*t1—s)/(s—1)], be derived 
from the recurrence relation 1 


ыы = Mattia- t- Раба 


with the initial values £j =... = E, = 0, Ё, = 1. The set of integers d such that 
ča=0 constitutes a difference set mod v = (st+1—1)/(s—1). 


The number of 478 such that £j, = 0 is k = (s'—1)/(s—1) and among the 
k(k—1) differences reduced mod v, each integer less than v occurs А = (s*!—1)/(s—1) 
times. The difference set provides a cyclic solution to the BIBD, PG(t,s) : (t—1) 
with the parameters 


sh] 


8-1 ві-1--1 
8—1 8—1 AS 8-1 


i 
4 
' 


ind 
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The difference sets for t = 2 and в = 3, 4, 5, 7, 8, 9 are given in Fisher and 
Уа (1953) and also іп Rao (1946). The solutions for the following are obtained 
using the minimum functions indicated. 


ref. no. parameters minimum function 
PG(2,11) :1 pep got po AI a3—32--1 
PG(2,13). 1 v= b = 183, r= k = 14, A= 1 x34 2w-+2* 
PG(3, 3), 2° 9-б- 40, т = К = 13, А = 4 a1— 233— 202—0 —1 
PG(4, 9), 3 v=b= 31, r=k=15, А=7 а5-ай--1 


We shall demonstrate the method for PG(3,3): 2. Тһе initial values аге 
bo = E =É, = 0, Е, = I and the rest č, to E55 are obtained from the recurrence relation 
bays = 264-264 t bao t+ Eos 
Thus £, = 2, & = 0, & = 2,... and so оп. The suffixes corresponding to zero values 


are 
(0, 1, 2, 5, 12, 18, 22, 24, 26, 27, 29, 32, 33) 


which constitute a difference set mod 40. On cyclic development we obtain the BIBD, 
PG(3,3) : 2, where the varieties are numbered 0 to 39. Similarly we find the cyclic 
solution to PG(4, 2) : 3 
(0, 1, 2, 3, 5, 6, 8, 11, 12, 18, 19,.20, 23, 27, 29) mod 31. 
Two difference sets for this design obtained by alternative methods are 
(1, 2, 4, 5, 7, 8, 9, 10, 14, 16, 18, 19, 20, 25, 28) mod 31 
due to Bose (1939) and 
(1, 2, 3, 4, 6, 8, 12, 15, 16, 17, 23, 24, 27, 29, 30) mod 31 
due to Marshal Hall (1958). Тһе last two are non-isomorphic and their relation to 
the first solution has not been investigated. ў 
Theorem 7: Let а*—а‚,_л-1—...—а, be a minimum function generating 
GE(s'), and the sequence Ej [d = 0,..., s'—1], be derived from the recurrence relation : 


Eja = Gaba t+ Ober 


with the initial values E, = Ё =... = Ej 4 = 0, É = 1. The set of integers d such 
that Е = а (fixed) AO provides a difference set such that among the differences mod 
(8-1), all integers less than (8-1) and not divisible by 0—(s'—1)/(s—1) occur an 
equal number, ($=), of times and those divisible by 0, zero times. 


*Minimum functions of certain orders are given in Charmichael (1937). Some algebraic methods 
of deriving minimum functions of the second order and, in a few cases, of the third order have been given 
by Bose, Chowla and Rao (1944, 1945a, 1945b). The third order minimum functions for s = 13 have not 
been reported anywhere in the literature. At the request of my colleague Dr. I. M. Chakravorty, Dr. Jack 
Alanen found some solutions with the help of a Burroughs—220 computer. I wish to thank Dr. Alanen 
for supplying a few minimum functions of the third order for s = 13, of which I am quoting one. А note 


by Dr. Alanen is appearing in this issue of Sankhya. 
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Such a difference set provides a compact representation of the resolvable 

BIBD, HG(t, з) : (1—1), with the parameters 
v = st, b = (st —s)/(s—1), k = 81, т = (st—1)/(s—1), А--(4-1--1)(в--1). 

Тһе s varieties are represented by the residues 0, 1, ..., &—1 and an element 
оо which is invariant under addition with the residues. Let dj, ..., d; be the difference 
set obtained as in Theorem 7. Consider the (8—1) sets derived by adding 0, б, ..., 
(s—2)0 to each element of the initial set, i.e., in the cyclic development of (d,. ..., dy) 
we take the sets obtained after 0, 20, ... steps. This operation may be denoted by 


, [(d,, ..., 4,)8(6)] mod (8—1) 
To this add another set consisting of the remaining (k— 1) residues of (s'—1) and an 
invariant element оо. This operation is denoted by 
[(d,, ..., d,)S(0)+-R] mod (8—1) 
which gives one complete replication of the resolvable BIBD. The other replications 
are obtained by adding 1, 2, ... (0—1), to each member of the first replication. The 
procedure for developing the whole solution of the resolvable design may be 
represented as 
PC(0)[(d,, ..., d) S(8)-- Е] mod (s'—1) 
where PC indicates a partial cycle, up to 0 only. 
Аз ап example let us consider ZG(4, 2): 3 and the minimum function z!—23— 1. 
We find 
bo = G1 = be = & = & = Sin = 6: = 0 
čs = в = 65 = & = & = 0 = ё = fy = 1 
and the set of d, such that £, = 1, 
35455) 68,10, 11,14 | 
is a difference set mod 15 with the property stated in Theorem 7. One replication of 
the resolvable BIBD, v = 16, b = 30, Е = 8, r= 15, А = 7, is 
: [(3, 4, 5, 6,8, 10, 11, 14) 8(15)4- В] mod 15 
which on development yields one replication 
(3; 4, 5, 6, 8, 10, 11, 14) 
(oo, 1, 2, 7, 9, 12, 13, 15) 
and the other 14 replications are obtained by adding 1, 2, ..., 14 to the elements of the 


first replication and reducing to mod 15. Similarly solutions are obtained for the 
following and given in Table 2. 


ref. no. parameters of resolvable minimum function 
BIBD es 
v ОА CET date, © 
Еб(9,1):1 121 132 1 12 1  z3—4z4-2, GF(11) 
EG(3,2) : 2 927 39 9 13 4 292, GF(3) 
` EG(2,13) :1 169 182 13 14 1  z?—z—1, GF(13) 
EG(4,2) : 3 16; 530 ANR 16 АЯТ z*—33—1, GF(2) 
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Theorem 8: If s is a prime or a prime power and 9 = (+1—1)/(%—1) is 

not integral, it is possible to find у = (st —1)/(s?—1) sets 

(do, dij, da j—L Y 
such that all the differences той v = (s —1)[(s—1) contain integers less tham 9 
once and once only. 

For the actual method of construction references may be made to Rao (1945, 
1946). Тһе solution obtained for PG(4, 2) : 1, with parameters, 31, 155, 3, 15, lis 
given in Table 2. 

Theorem 9 : If s is а prime or prime power, there exist 1] = (s11—1)/(s— 1) sets 

\ (фий), = 1,51 
such that all the differences mod v(=s'—1) contain integers less than v and not divisible 
by 0 = v|(s—1) once and those divisible by 0, zero times. 

Tf we add to the difference set of Theorem 9 the set (00, 0,0,...) with a 
partial cycle 0, we obtain a compact representation of the BIBD, EG(t,s):1, with 
the parameters 

v= s, b = $#$ —1)/(8—1), Е = 8, r=(s'—1)/(s—1), А = 1. 
The solution for HG(3,3) :1 is given іп Table 2. 

Theorems providing difference sets for the designs PG(t, 8): d and EG, s) : d 
for values of d other than those covered by Theorems 6, 7,8, 9 are given in Rao 
(1945, 1946). They have not been quoted here as they do not provide solutions to any 
of the designs listed in Table 1. 


TABLE 2. CYCLIC SOLUTIONS TO BIB DESIGNS DERIVABLE FROM FINITE 
GEOMETRICAL CONFIGURATIONS 


ref. no. solution 


Non-resolvable designa 
PG (2,11) :1 (1, 2, 4, 18, 21, 35, 39, 82, 89, 95, 105, 110) mod 133 
(0, 1, 3, 24, 41, 52, 57, 66, 70, 26 102, 149, 164, 176) mod 183 


PG (2,18) : 1 

PG (33):1 (о, 1, 26, 32), (0, 7, 19, 36), (0, 3, 18, 38) mod 40, PC (10) (0, 10, 20, 30) mod 40 
PG (3,3) : 2 (0, 1, 2, 5, 12, 18, 22, 24, 26, 27, 29, 32, 38) mod 40 

PG (4,2) 23 (0) (0, 1, 2, 3, 5, 6, 8, 11, 12, 18, 19, 20, 23, 27, 29) mod 31 


(3 solutions) 
(4) (1, 2, 4, 5, 7, 8, 9, 10, 14, 16, 18, 19, 20, 26, 28) mod 31 
(iii) (1, 2, 3, 4, 6, 8, 12, 15, 16, 17, 23, 24, 27, 29, 30) mod 31 
(0, 1, 18), (0, 2, 5), (0, 4, 10), (0, 8, 20), (0, 9, 16) mod 81 


PG (42):1 

EG (3,3) : 1 (0, 1, 22), (0, 2, 8), (0, 3. 14), (0, 7, 17) mod 26, PC (13) (50,70, 18) mod 26 
Resolvable designs 

EG (2,11) :1 РО (12)[ (0, 9, 27, 29, 46, 50, 76, 104, 107, 114, 115) 8 (12) + Е] mod. 120 

EG (3,3):2 РО (13) (0, 1, 2, 8,11, 18, 20, 22, 23) +R] mod 26 

EG (2,13) : 1 РО (14)[ (0, 9, 23, 35, 72, 92, 97, 110, 136, 151, 157, 158, 160)8(14)-- R] mod 168 

EG (4,2) :3 PO (14/1 (3, 4, 5, 6, 8, 10, 11, 14) + E] mod 15. 
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4. СҮСТІС SOLUTIONS TO NON-GEOMETRICAL DESIGNS 


Bose (1939) discussed some methods of obtaining cyclic solutions and gave 
the actual solutions to several series of designs designated by T, Ta, Fy, Fs, Go Go 
S,. Table 3 gives the cyclie solutions obtained by these methods, in so far as they 
are applicable in which cases the series to which each solution belongs has been 
indicated, and other solutions obtained by trial and error (indicated by the symbol —). 


There are several types of cyclic solutions as may be seen from Table 3. Іп 
a simple cyclic solution such as 35, the varieties are represented by integers 
0,1,...,v—1. From the initial set or sets, the whole design is generated by adding 
integers 1, 2, ... v—1 and reducing to mod v. That is, an integer i in an 
initial set is changed to 14-1, ...,0—1, 0,... i—1 in the derived sets. Іп some cases 
such as 32, the varieties are represented by integers and an invariant element designa- 
ted as oo, (Fisher апа Yates, 1953 use I for this purpose). This element remains the 
same in all the derived sets. 


There are dicyclic solutions such as 43, where the varieties are represented 
Бу (x,y), x —0,..,p—1 and у-0,1,...,4-1. The cyclic development with 
respect to one of the coordinates is carried out first keeping the other fixed. From the 
sets so generated others are derived by a cyclic development of the other coordi- 
nate, fixed in the first operation. When the initial sets are given in such a way that 
no cyclic development is necessary with respect to one of the coordinates for obtain- 
ing the complete design a dash is indicated in the symbol, mod (р, q) as in 73 (ii). 
An additional complication in dicyclic solutions is the introduction of invariant 
elements as in 48 and 62. Some designs such as 36 have tricyclic solutions. 


Notes on the designs in Table 3: 34 and 41 can also be obtained by block 
section from 35 and 53 respectively. It is not known whether from 76, one can build 
up the symmetrical design 80 with parameters 36, 36, 15, 15, 6, whose existence 
implies that of 76. Design 48, obtained by trial, is the most difficult one, and no 
known technique including the recent methods given by Bose and Shrikhande (1960) 
could yield a solution. A non-isomorphic solution to 32 is given by Skolem (1958). 


Some of the designs in Table 3 are of the resolvable type. But the solutions 
given are not resolvable except in cases where the index (r) is shown with the reference 
number. Two solutions are listed for 32, of which only (i) is resolvable. In the case 
of a resolvable solution, either one complete replication, as in 32 and 33, or the method 
of generating one or more complete replications, as in 66, is shown within square 
brackets. All the replications are obtained by the indicated cyclic development 
of the initial replications. In the case of 66, 


[(01, 06, 15, 12, 23, 24) mod (5, —)+R)] 
and [(01, 06, 35, 32, 13, 14) mod (5, —)--R] 
124 


A STUDY OF BIB DESIGNS WITH REPLICATIONS 11 TO 15 


TABLE 3. CYCLIC SOLUTIONS TO SOME BIB DESIGNS, NOT DERIVABLE FROM 
FINITE GEOMETRICAL CONFIGURATIONS 
А ы 


ref. по. v 5 k т method cyclic solution 
32r 12 44 3 11 2 -- (i) [ (0, 1, 3,), (4, 5, 9), (2, 8, 6), (оо, 7, 10) 1 mod 11 
(two solutions) E, (ii) (0, 1, 3), (0, 1, 4), (0, 2, 6), (%, 0, 5) modil ` 

33r- 12 88 4 nn з — [ (0, 1, 3, 7), (2, 4, 9, 10), (оо, 5, 6, 8) | mod 11 

84 12. 92 6 ll CES (0, 1, 3, 7, 8, 10), (со, 0; 5, 6, 8, 10) mod 11 

35.2.98 3-4 т 5 S (1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18) mod 23 

ба E аи EE I REAO 2703 (010, 020, 102, 202, 001), (210, 120, 222, 112, 001) 

mod (3, 3, 5) 
(000, 001, 002, 003, 004) mod (3, 3, —) 

а 13 36 8.13 Bol (0, 1, 3, 6, 7, 11), (0, 1, 2, 3, 7, 11) mod 13 , 

42 19 057 4.12 р (0, 1, 3, 12), (0, 1, 5, 13), (0, 4, 6, 9) mod 19 

43759112149 во 8d (00, 05, 14, 11, 22, 23), (00, 01, 03, 10, 11, 19) оа 

45 35 100 CR P dires (9 (0, 1, 3), (0, 4, 13), (0, 5, 11), (0, 7, 17) mod 25 

(two solutions) Ta (4) (01, 41, 13), (10, 33, 12), (32, 21, 02), (11, 24, 20) 
mod (5, 5) 

ив STP ЙГ: 1 -- (00, 01, 12, 15), (01, 03, 08, 10), (0%, 07, 15, 21) 
[varieties are (2, у) 2 = 0, 1,2; mod (3, 11) 
y=0, ...› 10, 9 and (250) | (оо оо, 00, 10, 20) mod (—, 11); (0%, 100, 200, 0000) 

53 27 27 13 13 6 Si [(001, 100, 120, 111, 202, 110, 102, 020, 021, 121, 

211, 022, 221) 1 mod (3, 3, 3) 

ше 5 l 4 o [(0,1, 4,9, 11), (0,1, 4, 10, 12), (co, 0, 1,2, 7)] mod 14 

62 15 35 6 14 5 - [(oo, 00, 10, 11, 12, 14), (оо, 10, 00, 06, 05, 03) 

(01, 02, 04, 10, 11, 13), (02, 08, 00, 10, 11, 13) 
(00, 04, 05, 10, 11, 13)] mod, (—, 7) 
63 22 77 4 14 2 — [ (00, 03, 09, 010), (00, 10, 12, 17), (00, 10, 19, 110) 
(00, 02, 15, 18), (00, 03, 14, 17), (00, 04, 13, 19) 
(00, 05, 12, 16) | mod (—, 11) 

66" 36 84 6 l4 Ro н [ (01, 06, 15, 12, 23, 24) mod (5, —) + E] mod (—, 7) 
[varieties are (x,y), #=0,..., $ [ (01, 06, 35, 32, 13, 14) mod (5, —) + Е] mod (—, 7) 
4,y =0,..., 6, and (0000) ] 

67 43 86 7 14 2 -- (оо 0, 01, 06, 15, 12, 23, 24) mod (5, 7) 
[varieties are (а, y) (оо 0, 01, 06, 35, 32, 13, 14) mod (5, 7) 

PART Е (со 0, coco, 00, 10, 20, 30, 40) mod (—, 7) 
y = 0,:.., 6 and (со, 00) ] (co 0, coco, 00, 10, 20, 30, 40) mod (—, 7) 
(000, 001, 002, 003, 004, 005, 506) mod (-,-) 
(000, 001, 002, 003, 004, соб, 006) mod (—,—). 

71 11 55 3 15 3 — [(0, 1, 3), (0, 1, 5), (0, 2, 7), (0, 1, 8), (0, 3,5) ] mod 11 

72 13 39 5 15 a E [ (0, 1, 2,4, 8), (0, 1, 3, 6, 12), (0, 2, 5, 6, 10)] mod 13 

73 16 80 3 15 aum e (4) (0, 1, 3), (0, 3, 8), (0, 2, 12), (0, 1, 7), (0, 4, 9) mod 16 

(two solutions) Es (й) [(10, 11, 12), (10, 12, 15), (00, 07, 10), (01, 06, 10) 

(02, 05, 10), (03, 04, 10), (01, 07, 10), (02, 06, 10) 
(03, 05, 10), (00, 10, 14)] mod (—, 8) 

74 16. 48 5 15 ЖҮН с [ (0, 1, 2, 4, 7), (0, 1, 8, 5, 10), (0, 1, 3, 7, 11) | mod 16 

75 16 40 6 15 5 сет [ (0, 1, 3, 5, 9, 12), (0, 1, 2, 3, 6, 12) | mod 16 
РС (0, 8, 1, 9, 2, 10) mod 16 

76 21 35 9 15 6 — [(00, 01, 02, 04, 10, 11, 12, 14, 22), (00, 06, 05, 03, 26 
24, 23, 22, 10), (10, 16, 15, 13, 26, 24, 23, 22, 00), 
(04, 01, 03, 10, 12, 16, 24, 21, 22), (00, 02, 06, 14, 11, 
13, 24, 21, 22) ] mod (— 7) 

$4 в 188 - 5. 18 1 6 ACE des TR (4, 36, 19, 49,114), (16, 22, 15, 13, 


Жа БЫШ у ы ыш... 
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give one replication each. The first expression gives the replication 

(01, 06, 15, 12, 23, 24), (33, 36, 45, 42, 03, 04) 

(11, 16, 25, 22, 33, 34) (41, 46, 05, 02, 12, 14) 

(21, 26, 35, 32, 43, 44), (co оо, 00, 10, 20, 30, 40) 
and similarly the second expression gives another replication. On cyclic develop- 
ment with respect to the second coordinate the rest of the replications are generated. 


Four more solutions, nos. 56, 65, 77 and 79, have been found by trial and 
error. In the case of по. 77, 26 varieties are represented by (zy) = = Os eins: 43 
у= 0, ..., 4 and о. The dicyclic solution in blocks of size 6 is 

(о, 00, 13, 21, 34, 42), (со, 00, 12, 24, 31, 43), (co, 00, 10, 20, 30, 40)] mod (—, 5) 
(01, 04, 12, 18, 21, 24), (01, 04, 22, 23, 32, 33)] mod (5,5) 

For no. 79, the parameters are v = 31, b= 93, Е = 5, т = 15, А = 2. А 
primitive residue of 31 is 3. Consider the set (3°, 39, 312, 318, 3%) which is same 
as (1, 2, 4, 8, 16) with an internal multiplier 2. From this two more sets are 
generated by successively multiplying by 3, (3, 6, 12, 24, 17) and (9, 18, 5, 10, 20). 
The cyclic solution (in 3 cycles generating 93 blocks) is 

[(1, 2, 4, 8; 16), (3, 6, 12, 24, 17), (9, 18, 5, 10, 20) mod 31 

Similarly the solution for no. 65 with parameters v = 29, b = 58, k = 7, r=14, 

A= 3is 
(1,7, 16, 20, 23, 24, 25), (3, 21, 19, 2, 11, 14, 17)] mod 29 

For по. 56, represent the varieties by (г, у), = = 0,..., 5; y= 0, ..., 12 and оо. 

The solution is 
[(00, 0 12, 11, 1 11, 24, 28), (01, 0 11, 20, 212, 34, 38)] mod (5, 13) 
(оо, 00, 10, 20, 30, 40) mod (—, 13) 

A simple method of constructing resolvable BIBD for the series v — s?, 
b =з (s+1), К = в, A = 1 has been found when GF(s) exists. Опе need not go 
through the construction of Theorem 7, although the result of Theorem 7 is 
interesting from the number-theoretic view point. Let 4, ..., c, , represent the 
elements of GF(s) and 0, 1,..., s—1, the module M(s) of residue classes mod 8 
and represent the varieties by (т, у) where хе M(s) and ув GF(s). Consider the 
set of pairs (2, y) 

(0 ао, 105, ..., (8—1) 24,1) 
vx cyclic development of y with respect to GF(s) one replication is obtained. The 
initial set for another replication is obtained by multiplying y in the first set by A 
an element of GF(s): Thus. 


(12,120, .- (8—1) Аа, 1) 
gives on cyclic development of y another replication. Since А сап һауе 8 values 


we obtain s replications. The (s+1)-th replication is obtained by the development 
with respect to x 


(0 0, 0 04, ..., 0 а, 1) mod (s, —). 
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For по. 87 represent the varieties by (2, y), 2 = 0, ..., 6; у = 0,..., 12 and 
ос. ‘The solution is 
(00, 0 12, 14, 18, 51, 511, 26), (33, 39, 42, 4 10, 65, 67, 26) mod (7, 13) 
(00, 10, 20, 30, 40, 50, 60) mod 13 


Note added in proof : Т am thankful to Prof. S. S. Shrikhande for pointing 
out that the solutions for nos. 41, 42, 43, 45, 65, 73, 74, 79 obtained by me also 
follow from the general series given by Sprott (1954). 
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and y 
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1. MOTIVATION 


A polynomial f(x) of degree n irreducible in the field GF(p) where p is a prime 
number, is called a minimum function, if a root w of the equation f(a) = 0, serves 
as a primitive element of GF(p"), that is, w? = 1, w, w?, ... wP"—? are the p"—1 non- 
zero elements of GF(p"). It is known that for the GF(p"), there are exactly ¢(p"—1)/n 
minimum functions, where ¢ is the Euler function, p a prime, and n ап inteffer. 


Minimum functions were used successfully in the past in constructing sets 
of mutually Orthogonal Latin Squares, Balanced Incomplete Block designs, Confounded 
and Fractional Factorial designs. Recently, these have found a new application in 
the construction of error-correcting codes in the theory of information. 


While searching for a minimum function of GF(13%) at the request of Professor 
С. В. Вю, who needed it for his research work on Designs, we noticed a lack of 
comprehensive tables of minimum functions in the published literature. The authors 
are now working on such a set of tables of minimum functions for generating GF(p"). 
This will extend considerably the table given in Carmichael (1937). 


2. TABLE 


Only one minimum function for each one of the cases р = 11,13, 17 and 
n = 3, 4, 5 is given below : 


СЕ(113) : f(X) = Хэ X2.3, 
GF(115) : f(X) = XA-LAX2-L2, 
GF(115) : f(X) = X5--3X*-EX*-L x43, 
GF(135) : f(X) = ХЗ X142, 
GF(134) : f(X) = X44.X843x249, 
GF(135) : f(X) = Хер X34 X42. 
GE(11)) : f(X) = X9-- ХЗ, 
GFT) : f(X) = X414X24 X 13, ы 
6Е (175) ДА) = X54 Ха Хз. 
Б 
REFERENCE 


MN) AY 7 ғ 5 : 
CanMICHAEL, R. D., (1937): Introduction to the Theory of Groups of Finite Order, Boston, Ginn and Company. 
A 
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ANALYSIS OF TWO-WAY DESIGNS 


ой; Ву 7. ВОҮ апа К. В. 8НАН 


Indian Statistical Institute 
s% с 


SUMMARY. Methods are given for analysis of general two-way designs with recovery of 
information from row and column-contrasts, using additivity of plot and treatment effects and the physical 
fact of randomisation. Column-balanced designs are discussed in particular with a numerical illustration, 


1. INTRODUCTION 


М 

ть for two-way elimination of heterogeneity have been considered by 
various authors: Bose and Kishen (1939), Fisher (1936), Rao (1943, 1946), Yates (1937), 
and Youden (1937). A general method of analysis under the so-called fixed effects 
Normal model was given by Shrikhande (1951). Тһе purpose of this paper is to 
validate Shrikhande’s (1951) results and to derive methods of combined estimation 
after recovery of information from row and column-contrasts using only the assump- 
tion of additivity of plot and treatment effects, and the physical fact of randomisation. 


To estimate treatment effects, three sets of contrasts namely, row-contrasts, 
column-contrasts and interaction-contrasts. are introduced and their distribution 
induced by the randomisation considered. These contrasts are all uncorrelated. 
But since the contrasts in different sets have different variances, it is not possible 
to combine them effectively unless the ratios of these variances are known. Therefore, 
best linear unbiased estimates are obtained first from interaction-contrasts only, 
in Section 3.1. The equations for estimation turn out to be the same as those obtained 


by Shrikhande under the Normal model. 


The equations for combined estimation after recovery of information from 
row-and column-contrasts are given in Section 3.2. Methods for estimating the 
variance ratios that are required in this problem are given in Sections 4.2 and 4.3. 
The analysis of variance is shown in Section 4.2. Conditions under which a two-way 
design compares favourably with the corresponding one-way designs are examined in 


. Section 5 and the relative efficiency-factors worked out. 
+ 
Though the analysis requires rather heavy computations in the general case, it 


is shown in Section 6 that if the columns of the design, ignoring rows, form a balanced 
incomplete block design the analysis is much simpler. The special case where columns 
are balanced and rows partially balanced is discussed i in full and a numerical example 
is given in Section 8. АП derivations are given in Section 7. 
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2. PRELIMINARIES 


2.1. The additive model. Suppose there are mn plots or experimental units 
(eu’s) on which a comparative trial involving 0 treatments is to be carried out. The 
eu's are arranged in a mx two-way classification, so that each eu is determined by 
a pair of co-ordinates (t, и) t= 1, 2, ...,m; и = 1, 2,...,n. With the (t, w)-th eu 
is associated a number тү, to be called the plot effect and we assume that if the k-th 
treatment is applied on the (t, u)-th eu, the ‘yield’ would be 2+, where the 
parameter 0; is to be regarded as the effect of the k-th treatment; k=1,2,...,v. This 
is the so-called additive or no-interaction model. The purpose of the experiment is to 
compare the 0,8. 
We now define 
и =X Узжн[тт, the general mean, 
o? = У p?/(m—1), the between-row variance, (21) 
02 = X y2/(n—1), the between-column variance, 


оў = X У 73,/[(m—1)(n—1)], the interaction variance, 
where p; = > 2ц-, Yu = zi зми and ты = хари H. We shall write 


w; = obo? {= 1,2 Ек e) 
for the ratios of the variances. 


2.2. The design. The treatments are allocated to the eu’s in the following 
manner. First a two-way design, that is, an arrangement of the v treatments іп m 
rows and т columns is taken. The design is thus completely characterized by the 
numbers єр, ¢=1,2,...,m; ј = 1,2, ...,; k=l, 2,...,0 where є. = 1 if the 
k-th treatment occurs in the intersection of the i-th row and the j-th column of the 


т 
design and є; = 0, otherwise. Тһе k-th treatment thus occurs in mj; = X 6; 
j=l 


positions in the i-th row and іп nj; = X ед positions in the j-th column. We shall 
ici 


restrict ourselves to equi-replicate designs, that is to those designs, where each treat- 
ment occurs altogether in r positions, Thus X mp = Ут = г and of course, 
i i 


Em =n, а T — т. We shall call M = ((m,;)) апа N = ((т,;)) the row incidence- 


matriz and the column incidence-matrix respectively. The rows and the columns of 
the design are then allotted to the two ways of classification of the eu's indepen- 
dently and at random. 


2.3. Consequences of randomisation. Let us denote the yield of the eu 
corresponding to the i-th row and the j-th column of the design by yi; The randomisa- 
tion procedure ensures that . | 


Е(у;) = e+ = Eijk Or AS (2.3) 
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and that 
Уф 1 t 1 1 1 
= ter) = (Aere (аз) аа) (о) n 
where à, is the Kronecker symbol, б, = (0) if i = i'(i 75). 

2.4. A linear transformation. Since the y;;'s are correlated, it is convenient 
to make a linear transformation and obtain uncorrelated random variables. For 
this purpose, we use the following definitions. A linear function of the form 
1 = X Xl; у, is said to be a contrast if У 31, =0. А contrast 1 is said to belong 
to rows, or simply called a row-contrast if lj, = li ==... = lin holds for i = 1, 2, ..., m. 
Similarly, a contrast J is said to be а column-contrast if ly = ly = ly; holds for 
і-1,9,.2,т” A contrast l is said to belong to interaction or simply called an 
interaction-contrast if У 1, = 0 for j = 1, 2, ... n and У = 0 for i= 1,2,..., m. 

i 1 


A contrast J is said to be normalised if X 2 = 1. Two contrasts | and // = 
X Xd y; ате said to be orthogonal if Ў X. l;;l; = 0 holds. ! 


Tf then we make a linear transformation from ув to (i) 6% = G/4/mn, where 
G — X Ху) is the grand total, (ii) а set of (m—1) mutually orthogonal normalised 
row-contrasts (iii) а set (n—1) mutually orthogonal normalised column-contrasts, 
and (iv) a веб of (m—1)(n— 1) mutually orthogonal normalised interaction-contrasts, it 
can then be shown as in Section (7.1) that the transformationis orthogonal and that these 
transformed variables are uncorrelated; the variance of any normalised row-contrast: 
being 10$, the variance of any normalised column-contrast being mo} and that of any 
normalised interaction-contrast being оў. Since the expectation of each contrast 
is a linear function of the був, the method of least-squares can be used for purposes of 
estimation. 

2.5, Notation. We shall write В, for the total yield of the i-th row, О, for 
that of the j-th column, and Т, for that of the k-th treatment; thus 


n m т т 
= Š yp С, = È yy and T,— X Хед 
R; а Уи Я je Jü k fei dei jk Yij 


We shall use the matrix notations: R = (Ry, В„...‚В„) O = (0, 0 «5 6,), 
quos mune) and 0 = (0,60, ..., б). Also, а matrix of the form pxq 
with all elements unity will be denoted by E,,. 

Tf A is a positive semi-definite matrix of form аха and rank b, it has b positive 
.,b. Let Ё; of the form 1 xn be a latent vector of A 


latent roots, вау 04 $ = 1, 2,.. 
i= 1, 2, ..., b such that Ё, £j = бу. Then the 


corresponding to the latent root %;, 


b 
matrix А* = X Ay E, will be called a pseudo-inverse of А, in the sense of Rao (1955). 
і=1 4 
3. ESTIMATION OF TREATMENT EFFECTS 
81. Estimation from interaction-contrasts. Since row-contrasts, column- 


contrasts and interaction-contrasts have different variances, it is not convenient to 
use them simultaneously for estimation of treatment effects in an efficient way unless 


the relative magnitudes of these variances are known. We shall, therefore, consider 
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first the problem of estimation from interaction-contrasts only. Recovery of infor- 
mation provided by row-contrasts and column-contrasts will be taken up in Section 3.2. 


As we have pointed out in Section 2.4, апу set of (m—1)x(n—1) mutually 
orthogonal normalised interaction-contrasts are mutually uncorrelated and each of 
them has the same variance сё. Also, the expectation of each is a linear function of the 
бз. Consequently, the method of least-squares can be used to derive linear unbiassed 
estimators with minimum variance (‘best’ estimators) of linear functions of treatment 
effects. Ав will be shown later in Section 7.2, the method of least-squares gives the 


equations: 
В өк-о0 15 (821) 
where the elements of 


0-т- км-і CN’ т Е, ... (8.2) 


are called the “adjusted yields of the treatments and 


K-rI-l1 Mw 
n 


т 


2 
NN’+ Е, ... (3.3) 
тт 
will be called the coefficient-matriz of the two-way design. 


Since KE, = 0, rank (К) < 0—1. А two-way design will be said to be 
doubly connected if its coefficient matrix K is of rank (0—1). In whatever follows, 
we shall assume that the two-way design is doubly connected. 


It is well known from the theory of least-squares (see Rao, 1952) that any 
t 
linear parametric function of the form © = X 1,6, with X l =0 admits linear 
k=1 Есі 


unbiassed estimators, and amongst them the one with minimum variance is 7 — X АЯ 
where t = (t,, ta, ..., t) is апу solution of (3.1). То obtain the variance of 7 express 


it in the alternative form T= X ть Qr and then V(T) = ( > l; т) оў. 
k=1 21 


It may be noted that the equations (3.1) are the same as obtained by Shrikhande ' 
(1951) from the so-called ‘Normal’ model. Тһе present approach demonstrates the 
robustness of this procedure and is aesthetically more satisfying to the authors. 


3.2. Recovery of information from row-contrasts and column-contrasts. In 
the previous section, we had simply thrown away the row-contrasts and the column- 
contrasts. If the ratios w; = 08/03, i= 1,2 are known, the method of weighted 
least-squares can be applied on all the three sets of contrasts simultaneously. If the 
weight for the normalised interaction-contrasts is taken as unity, the weight for 
normalised row-contrasts will be w,/n and that for normalised column-contrasts will 


be w,/m. As will be shown in Section 7.2, the method of wei 


ghted least-squares now 
gives the equations : 


eK — Q ... (3.4) 
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where | ; 
Nd sm RM Cro GN м ( Pee аты! Кыл; 
RS EDO pk мм’ шк, мм (1 zik. P )Е,,... (3.6) 
and { AS ma VAS aa 5 (87) 


Then the best estimator of @ = 2 % 0, where У z Š h = 0, is given by ue х h i, 
where ¢ = (АҺ, й, ..., &)18 any Mn of (3.4). The variance of 7 is most “Шү 


obtained by writing it in the form Х 7,0, and then (Т) — ( X Im) о}. 
k=1 k=l 


Generally, however, the parameters A, and A, would be unknown and 
estimates D, and D, for them may have to be substituted. In this case, of course, 
the ‘bestness’ of T as an estimator of @ would no longer hold, but conceivably if 
Dy, D, are at all good estimators of Ау, А,; 7 might yet be better than 7, Methods 
for estimating the A’s are given in Section 4.2. 


4, ANALYSIS OF VARIANCE 


4.1. Analysis of variance of interaction-contrasts. To estimate с? and to carry 
out an omnibus test of significance of treatment differences, the analysis of variance 
is to be done as shown in the following table. 


TABLE 41. ANALYSIS OF VARIANCE 


Source degrees of freedom. sum of squares 
(1) (2) (3) 
/ ў . AUS ones өз 
rows (unadjusted) m-—1 55 = ры Ri CAES 
В ln [er] 
—1 886 = — АВС 
columns (unadjusted) n Sg = ё, 0} an 
ы 
v 
treatments (adjusted for 2-1 SS, = X быр 
rows and columns) pe 
error »-(m—1l)(n—1)—(v—1) 880 = 881-686, 
interaction (»-1)а-1) SSI = 88-88%, — 884, 
ae С 
total mn—1 SIn = 2 24226 
i 1 ізі САТЫ mn 
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An unbiassed estimator of оў is provided by the error mean: square 
MS, = SS,w. To examine the significance of treatment differences, one may use the 
customary ratio of mean squares MS,,/MS, where М8,-- SS,(v—1) is the 
treatment mean square. The sampling distribution of this statistic under randomi- 
sation, when the treatment effects are identical, is usually approximated by the Snedecor 
F-distribution with (0—1) and v degrees of freedom. The accuracy of this approxi- 
mation is under investigation. 


4. Estimation-of weights for use in recovery of information. An estimate of 
A, = то (по3-өф) can be obtained as in the case of incomplete block designs by 
considering the expectation of 88р, the adjusted row sum of squares. To compute 
88, one has to carry out another analysis of variance ignoring rows. Let t, be any 
solution of the following equations іп 0 


ӨК, = ©, NEF) 

where (= T— 1 CN' and к, = ri— NN’. ... (4.2) 
m m 

Then 88, = 885--88,—Q.t- we (43) 


We shall show in Section 7.3 that writing MSp = 88,/(т--1) for the adjusted 

row mean square, the expectation of MS, is given by 
E(MS p) = (1+-(n—a)/Ay] % ... (4.4) 
where ау = tr KM M'/(m—1) ... (4.5) 
in which (ғ denotes the trace of a matrix and Kj is a pseudo-inverse of the matrix K,. 


Consequently, we can take 


- (па) MS, 
р, = ММВ, ... (4.6) 


аз an estimator of A, іп the sense that the ratio of the expectations of the numerator 
and the denominator of D, is equal to A,. An estimator D, of A, can be obtained in 
like manner. 


4.3. А positive-definite estimator of a$. И (4.4) is used for estimation of тї, 
the estimate might on occasions turn out to be negative. We propose here an alter- 
native procedure which is applicable to certain types of designs and has the advantage 
of providing a positive-definite estimator of 01. 


As will be shown in Section 7.2, the least-squares equations for estimation of 
0 в from row-contrasts are ` 


e( Mr) = (nw x, ). 2. ил) 
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Tf t* is any solution of the above equations in @ and if 
2 
= rank pr - 
р ru (MM тЕ»)<т 1 es (4.8) 


the residual sum of squares in the analysis of variance of normalised row-contrasts 
can be used to estimate о?. Thus, 


(Ең-ет)-кме" 
Ойл» - 


is a positive-definite unbiassed estimator of 02. The corresponding estimator of A, 
is Di = nMS(ns]—M 5%). 


8: = (4.9) 


5. EFFICIENCY 


It is known [Kempthorne (1956), Roy (1957)] that the average variance of 
interaction estimators of differences of the type 0,—бь, is 20%/h(K) where K is the 
coefficient-matrix of the design and /(К) denotes the harmonic mean of the positive 
latent roots of K. If instead of the two-way design, a one-way design using 
columns as blocks were used, the average variance of intra-block estimates would then 


be 2 Ц 1 -2) eiat] (ҚҚ) where К, = rI— 2 NN’. As a measure of the 
efficiency of the two-way design in comparison with the one-way (column) design, we 
propose the ratio of the reciprocals of these average variances. This turns out to be 

Е = еф О) 
where е = (K)/h(K,) will be called the efficiency-factor of the two-way design relative 


1 
to the one-way design using columns as blocks and ф = 1+ А 
: т 1 


It will be shown in Section 7.4 that the relative efficiency-factor е < 1. Conse- 
the two-way design is effective only if ф > 1/е. This parameter 0 can be 


quently, 
uting the estimate of A, as obtained in Section 4. 


estimated by substiti 
6. TWwO-WAY DESIGNS WITH COLUMN BALANCE 

A two-way design will be said to have column balance if each treatment occursin 
a column at most once, and any pair of treatments occur together in the same number, 
say A, of columns; or in other words if the columns of the design regarded as blocks 
form a Balanced Incomplete Block Design. A column balanced design is said to be 
a Youden Square if the row incidence-matrix M = E,,, and an extended Youden 
Square if M= рЕ,„ where p is a positive integer, p > 2. 

Shrikhande (1951) claims that all known column-balanced designs can be 
a way that (i) a partially balanced association scheme with 


arranged in rows in such 
be imposed on the treatments and (ii) the т,/в satisfy : 


two associate classes can 


{ ш НЕЙ 
mpi = 2. (61) 
Е и ks д, db № are w-th associates; u= 1,2. 
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For a definition of a partially balanced association scheme, the reader is referred 


to Bose and Shimamoto (1952). 


If ду + д», the designs satisfying (6.1) are said to belong to the class Y; 


(Shrikhande, 1951). 


Тһе analysis of designs belonging to the class Y, is particularly easy, being 
similar to that of partially balanced incomplete block designs with two associate classes. 
The analysis based on only the interaction-contrasts under the ‘Normal’ model is given 
by Shrikhande (1951): here we give the complete analysis including recovery of infor- 


mation from row-contrasts and column-contrasts. 


For the parameters of the partially balanced association scheme, we shall 


use, the standard notations рі, %; i,j,k = 1,2. Let now, 


acil (%—/) 
MOM yy АА 


1 
b= 2% (Hi— Ha) 


SI 2 
Cpi Pi 


ч 
d = т-р 


Then a solution of the equations (3.1) for a design of the class У, turns out to be 


fy = [A Q,+68,(Q,)]/D 
where S, denotes summation over first associates and 
А = a--bc 
D = аА--04. 
Since the variances of estimates of treatment differences are given by 
[2(A—b)/D]o3, if b, k' are first associates 
V(t,—tp) = { 
[24/D]e$, otherwise 
the relative efficiency factor of this design turns out to be 


ves m(v—1)D | 
Av[(v—1)A —n4b] 


For combined estimation, a solution of the equations (3.4) is 
i, = [4 бұ SOD 


where А = G+be 


(6.9) 


(6.3) 


(6.4) 


(6.5) 


(6.6) 


(6.7) 


and 


(6.8) 


= ы 
"EA, Шә). 


To estimate A, we need a solution of (4.1) which in this case turns out to be 
m 
= yy Cu 


where Q,, is the k-th element. of Q, defined by (4.2), An estimate of A, is obtained by | 
putting 
= дшш ... (6.9) 


in the expression (4.6) for D}. 
Similarly to estimate A, we need MS, the adjusted column mean square 
given by 


(n—1) ИЗ = 88:4-88,-0-6; (6.10) 

t, being any solution for @ in я 
ӨК, = Q, . ... (611) 
where oi T+ RM’ and = -2 MM’. 12 (6.12) 


In this case t, = (tə, ... to) is given by i 
ty LA Qa SIQUID" 2. (813) 


where A = Алу». and Р' = [a+(r—A)/m]4’—b’d. Ап estimate of А, 
is given by í 

_ (m—a,) M8, D 
Des ME 2. (614) 


- —1)A4'—n4b 
where. du а= Ed cn mb] 455 (6.15) 
The derivation of (6.9) and (6.15) is given in Section 7.5. 
7. DERIVATION OF RESULTS | 


ү 


Я TH this section we prove some of the results stated үе: 


7.1. Some lemmas. | i. 
Lemma 1: Af 188), i, а--1,2,..,т; j, В = 1,2, ..., т are real numbers 
Keen. “satisfy : ND ? Ag? ДЕ? = бы, Spar, where д is the Kronecker ae, 
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(b) ig? = 1/A/mn, (с) Kem = Кен), a = 1, 2, ...,m—1 and (d) Ци” = 1g”), В = 1, 2, 
so т—1 then we have: 


т-1 1 1 


iw ) 22 ls + X. 
ету D ) (7.1) 


23 т 


"S'S дав ЛЕ («-Ш)(%-і) 


й 
а=1 8=1 


‘ т бе, when В = В' =n 
and E 5169 Hg? б, = = ihe 
ups. 8 . 
Up 0 otherwise 
m двр, when оа = о/ = т 
E X IGF) (es) 5. = we (te) 
ry aera ae ij jj п. 
М, 592 0 otherwise : 


E X MUL би, бу, = Ms Dp. 


g ovy 
"These follow easily from the properties of orthogonal matrices. 


Lemma 2: Let Zag = E Ig?) уг. Then 


ü 


Et (тти--т X 0.) when a = т, B = т 
mn k 

Е(2.р) = EARB) 
> av) 0, otherwise 


where agP = X199 ei. 
ü 
Also, these Z,,’s are mutually uncorrelated and they have variances given by 


оО оту рп 
no? if В = т; а = 1, 9, ...,т—1 
(а.в) = E14) 
moi-ifa-—m;-1,2,.4n—l 
of ik a = 1,2; ...,m—1; f. 153, ...,n—X. 


These results are obtained by direct computation using the expectations and covariances 
of уз given by (2.3) and (2.4) and the properties of 128% given by (7.2). 
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Lemma 3: With Z,, and ajg?"s as defined in Lemma 2, we have 


1 z m 
E EN 70 
=1 % int т 
” ы КЕ TG. 
се DOS d estas DL сг. 
9; с; тВ ^ m ja СЯ C; "i (7.5) 
m-l m m 
=> X 2, 49-7,-1%, Rl $a 70 
Pic. в aj DE NOU n M Вя 
^ m-i ] ^ 2 
= =z айт qi — у mimi. 
Уы’ ез k у \ у і an 
> т-1 B 1 2 72 
‚= X а" Ит” = Хип’ М. 9,6 
Уы: Psi p» af mi B PU (7.6) 
БИБИ alah) go 4-2 5% Іі; 11 
= артар = TOi ОУ тыт, —— Ут, ты Г, 
Уы ИА Ek ES i Miri тул НЗ Төн D 


These results are obtained by direct computation using (7.1) and (7.2). 
7.2. Derivation of least-square equations, According to the method of weighted 
least-squares, we have to’ minimise X Ye --Е(2,3)), where the summation 
aß 


is over all values of а, 7 except x = m, f = n. On multiplication by 0$, this reduces 
to minimising 


—1 т-і n-1 
L="S S [Zep —B(Zop H ® (Zen —E(Zan PHT [Zme—E( Zing). 
а=1 B=1 N а=1 M В=1 


Equating the partial derivative of L with respect to 0, (k = 1, 2, ..., v) to zero, we get 
the equations: 
ibe Wi. 7%. Wp, W 
E ( Yaw + Vw ^ Уш ) б = 9+ Qi Qr ЕЙ 
k= 152/050 


where Q; and yy, etc., are given by (7.5) and (7.6). "This, in matrix notation is our 
equation (3.4) for combined estimation. То obtain the equation (3.1) for estimation 
from interaction-contrasts only, веб w, = w, = 0 іп (7.7). Similarly if we want 


v 
estimates from row-contrasts only, the equations would be 2 К 0, = Qi which, їп 


matrix notation, is our equation (4.7). 
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7.3. Expectation of the adjusted row sum of squares. Since the adjusted 
row sum of squares SS, defined by (4.3) is invariant funder the transformation 


yug = mÈ, еуь бъ. its distribution and therefore its expectation does not involve бу. 
Consequently in computing Е(88,) we can ignore the terms involving 0,8. Now, 
Since 

Е(88),) = (m—1)no$4-terms in бз 
and E(SS,,) = (v—1)o$--terms in був ы 167.8) 


it follows from (4.3) that all that we need now to compute H(SSp) is H(Q, t;) where 
Q, t, are defined by (4.1) and (4.2). Let Kj be a pseudo-inverse of the matrix К, 
defined by (4.2), so that a particular solution of (4.1) is t, = Q,Kj. Hence 


E(Q, ti) = Е(0,Қ Qi) = Z(r(Ki Qi О,)] 
= [Кү (О, Q1] 
= tr Ki D(Q,)-+terms іп бз 221,0) 
where D(Q,) stands for the dispersion matrix of Q,. To compute D(Q,) we express 


О, in the form О, = 9+1 R ( м'н). Since the elements of О аге interaction- 

contrasts and those of i R ( М = ) аге row-contrasts, these are uncorrelated, 

and therefore D(Q,) = D(Q)--D | i R ( ме )l . Since D(Q)— K of and 
: 2 


DR) = (1 -E ) ме $ we get on simplification 


DQ) = (к.е) ota (of-2). 20) 


Using (7.8), (7.9) and (7.10), we get finally 

E(SS,) = Е(88%)--Е(88,)-Е(0,6) 
= [n(m—1)—tr КІММ703--(” Ki ММ7)с т | 
from which (4.4) follows. i ) 


7.4. Relative efficiency factor. То prove that the relative efficiency factor 
e defined in Section 5 cannot exceed unity, we need the following result’ in matrix 
theory. 


Lemma : If A and B are positive-definite matrices of the same order and 
С = А-В is positive definite (semi-definite), then D = В-1--А-1 is positive definite 
(semi-definite). 
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Proof: We observe that if P and Q are symmetric matrices of the same order 
and P is positive definite, a necessary and sufficient condition for Q to be positive 
definite (semi-definite) is that the roots of the determinantal equation |Q—AP|=0 
are all positive (non-negative). Now, the determinantal equation |C—AA| =0 
‘is equivalent to the equation |D—AB-| = 0, This follows by pre-and post-multiply- 
ing the former equation by A~ and B4 respectively. Also, C being positive definite 
(semi-definite) implies that the root A are all positive (non-negative) which in turn 
implies that D is positive definite (semi-definite). 


Next consider a (0—1) хо matrix P satisfying PP’ — I and P/P = pun 


Since our design is doubly connected it follows that A =PK,P’ and В = PKP' 
are both positive definite and A—B => PMM’P’ is positive definite or semi-definite, 


1 


Now НЕ)—ЩК) = ЩА) НВ) = (0-1) (рр) 
= (c— ИВ A >0 


(A>)(trB>) 
from which it follows that е < 1. | 
7.5. Hstimation of weights for two-way designs with column-balance. 
Av 
m 


[1-5] we easily get Kj = е-е IE Hence 


MM TIE from which (6.9) follows very easily using (4.5). 


Since in this case K, = 
W 
Av 

Since NN’ = (r—A)I-+-AE, we have ҚҰМЫ = (r—A) rK. Now if t = ОУ 
be any solution in 6 of the equations (6.11), it can be shown that 


Ki MM’ = 


К = tr(¥)— * (sum of all elements of Y). Since (6.13) provides a solution of the 


equations (6.11), (6.15) follow immediately. 


8. NUMERICAL EXAMPLE 


Table 8.1 gives the yields and the lay-out of a design of the Y, class with 
treatments indicated by numbers within brackets. 


TABLE 81. YIELDS AND THE LAY-OUT AT A DESIGN OF THE Y, CLASS (WITH 
TREATMENTS INDICATED BY NUMBERS WITHIN BRACKETS) 


ү 1 2 3 4 5 6 1 8 9 10 Ri 


40.1 161.8 112.2 153.9 116.5 189.2 160.3 152.7 178.0 134.9 1499.6 
т 1% (5) (1) 4) -- (4) (6) (2) (8) (3) (1) 


9 2.6 129.2 -89.5 97.4 103.9 142.5 138.8 106.9 133.3 87.9 1132.0 
SOO mT о ие е B (9) 


.9 165.8 138.3 141.6 — 79.8 141.6 161.2 136.1 155.8 107.1 1383.2 
: © (3) (6) (6): (2) 0 (9 (1) (2) (5) 


Cj 398.6 450.8 340.0 392.9 300.2 473.3 460.3 395.7 467.1 329.9 4014.8 
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The parameters of the design are : m = 3, п = 10, v = 6, r=5,A=2, 
m = l, m= 4, = 9, щ = 9, Ға = 8. pu = 0, pi = 0. 


àv 1 1 
= ———(flg— tg) = 3.9 6 =— (fy — Ha) = 0.1 
a = 240 из) = 3.9 x (4149) 
c= р-р) = 0 d —n—pi-l 
А = a--bc = 3.9 D = aA—Ud = 15.2 


The computational details of estimation are given in Table 8.2. 


TABLE 8.2. COMPUTATIONAL LAY-OUT FOR INTERACTION ESTIMATES AND 
COMBINED ESTIMATES 


troat- Ist asso- 7 [В [CO] тп. тпру ток — nQsk пр Qr Diz 
ments ciates of 
k k 


q (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) 


1 3 583.2 6646.4 1857.1 — 940.2 3543.4 --107.5 —814.4  —3965.3 — 37.12 —145.29 
2” 4 623.9 6897.6 1956.1 --1462.8 --5748.0 --84.4 — 658.6 — 3236.8 --50.04 —202.24 
3 1 689.9 6646.4 1959.8 1233.8 4717.8 109.9 252.6 1156.3 39.20 153.82 
4 2 680.1 6897.6 2022.4 --439.8 --1861.5 17.9 --96.6 —539.2 —13.45 —-58.85 
5 6 686.3 6530.0 2120.0 —127.0 .—321.7 —61.1 333.0 1730.1 —0,51 3.93 


6 5 751,4 6530.0 2129.0 1736.0 6757.7 125.2 984.0 4854.9 61.92 248.63 


4014.8 106 = 39 — 0% 0+ 0% 0 0% 0% 0% 
4014.8* 12044.4* 


[Ri = E mg; Ri, (Cle = E ngj О); тпр = ттт mER]k—n[C]-I-rG; 
та = A.mnQg--bS1 (тп); Mik = тТь- [С]; 
nQog = [Е], ; "Пі = А'п 64-681 (10е); 


= ах IR MR £ Dı Ds 
whet np, 1816 m-D» th [| Іар t] 


Dig = 20-58, (Qj). 


*Denotes check. 


If we want interaction estimates only we need proceed only up to column 


(7) in this table. Тһе analysis of variance table may be prepared at this stage as 
shown in Table 8.3. - 
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TABLE 8.3. ANALYSIS OF VARIANCE 


ee 


source degrees of sum of mean F 

w freedom squares squares 

rows (unadjusted) 2 SS». = 7059.34 

columns (unadjusted) 9 88% = 11753.55 

treatments (adjusted for 

rows and columns) 5 55 = 2204.15 М8), = 440.83 3.39 

error 13 SSo = 1690.66 MS, = 130.05 

interaction 18 SS; = 3894.81 

total —— 29 SSp = 22707.70 


For recovery of information from row-contrasts and column-contrasts further compu- 
tations may be made as shown in the columns (8) to (12) of Table 8.2. The 
constants required are given below. 


Al = A+(r—A)/m = 49, Di — ит’ фа = 24, 


ш ну, Ere DA mb] ое, 
а= A(m—1) -9; (m—1)D'- 
We obtain 
Q, ti = 1402.40 Qt, = 4607.75 
SS, = SS8*54-88,—Q,t, = 7861.09 SS, = S85-I-88,—Q;,t;— 9349.95 
(п--а)) М8, _ n, — (M—a)M So 3808 
7s, MS, = 0.3251 Ds, = MSMS. = 0.380 
pM HA = 12154 баршы Die" 4 — lg) = 0.09685 
a= ЖОКТУ, “ур; (Ho—H#2) 579; iB D, 1749) 
A= 4--бе-- 4.01579, . D = àA—Ud = 16.11719. 


The two sets of estimates are given below in Table 8.4. 


TABLE 8.. ESTIMATES OF TREATMENT EFFECTS 


k tk % 

1 - 7.77 - 9.01 
2 --12.61 —12.55 
3 10.35 9.54 
4 — 4.08 — 3.65 
5 — 0.71 0.24 
6 14.82 15.43 
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Estimated variances of the estimates of treatment differences for the two sets are 
obtained as shown below. 


204—6). М5, = 65.03, if К,’ are first associates 


D 
Est. вр.) = | 
Б> MS, = 66.74, otherwise 


(4-0) уу, = 63.24, if (Е, К) are first associates 
D 


24 MS, = 64.81, otherwise. 
D 


Est. У(„—{) = 


This shows that recovery of information from row-contrasts and column-contrasts 
has not resulted in appreciable gain in precision. 


Next we compare this design with the design formed by taking columns as 

Я 1 

blocks. Тһе relative efficiency factor е turns out to be 0.97938. Since ф = 1-- А 
g 1 
= 4.07598 we get Ё = ехф = 3.99 as the efficiency of the two-way design relative 


to the column-design, which shows that the gain in precision is appreciable. 
p ay 
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ON SOME PROBLEMS ОҒ FRACTILE GRAPHICAL ANALYSIS 


By N. M. MITROFANOVA 
Mathematical Institute of the Academy of Sciences, USSR 


SUMMARY. Professor Mahalanobis has raised some nonparametric statistical problems in his 
paper “Ол fractile graphical analysis” (Mahalanobis, 1959). We consider here some of them under 
somewhat restrictive conditions on the random variables under consideration, 


1. Norattons 


Consider any bivariate population of two random variables X and Y with 
the distribution function F(x, у). Draw two independent samples from this population, 
each consisting of ng units, Rank these pairs of values of the two variables in the 
order of ascending values of x. 


(hs 01), (аф, И)... ngs уб), о 
(а, уг), (а, 05"), ss (is Hie), LAS ж X d. чш 


and form the functions 


L(x) = (L2) = Яир 2), i-1 < 5% t= 1,2,...,(g—1) (1) 
L(x) = Ша) = 9100—41) 700—2), і-1 Qe <i, i= 1, 2,..,(g—1) (2) 
1 kn 1 kn 
where h-- У 5» й= Sy Е зуу. 
je(-1)n44 j=(k=1)n+1 
2. PROBLEM І 
Denote 
а= | [max (74, L)—min (L, Lj) de, i= 1, ..., (9—1) ... (8) 
4-1 
4-1 5 ue 
с | [max (L, L’)—min (L, Т/)] dx = Ха, 5774) 
0 


In the terms of the paper mentioned above this is an error area a(1, 2) to be 
associated with the combined fractile graph G(1, 2), 


First of all we shall be interested in Ба and Da, 
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Жу 3. Case (i): X AND Y ARE INDEPENDENT 
Let the random .variables X and Y be independent, so that F(z, y) = 
D(x) . Fly), D(x) and F(y) being the distribution functions of X and Ү respectively. 


The relation 


| (ng)! Фаза Ру)... AD eg AF oy) = 4700)... ФР, ), 
a Sas. За 
shows that the variables Yi, Y$, .... Yn are independent, and P( Y; <y)= Fly): 


i=1,...,ng. The random l varias Jo Ys ce Мне the same distribution 
ролевое and are also independent. 


Let КУ = mand DY = 0°. Since max (Li, L;) — min (Li, L;) = max (L;—m, 
Li—m)— min (L;—m, L;—m) we have the right to suppose further m = 0. 


М Evaluation of Ea. 
91 oe 1 i 
Еа = X Еа = X | ГЕ max (L; L;) —E min (Li Lj) \da. АТЫС) 
ізі i=l i1 
Consider E max (Lj, 14) = Т; 516) 


Тһе random variables Г; and L; are identically distributed, therefore, 


т, = | «ИР < әр. 


Making the substitution z= dm Да-а 
we get т; БЕ МЕЕН af | "ешр... 0) 
уп 4 
h A Nou. ; 
where G,(u) = P ( Ртг исин І,<ч ) 


According to Liapunov's theorem 


ve 
) G, (w) > Gu) = = І e ? dt as n— оо. 
ж 
Непсе vet рата [vitio ... 8) 


М Оп the other hand, obviously 
4 . 
mint (2. % Tis Ji) < max (14, 1/) < max (fp Ji. Янь dia 
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Hence |< fz ЧР e 2 <4 [|2 |Р <=) 


S £y [APG <] = 4 үрд 4-72. 


Т Vat) o. } e 9) 
c 
(8) and (9) enables us to assert that 3 A 
Г VaT nis окы 
| а > | М1 — а)? dx. | мб ав n> co. 
1 0 2 
If T; = Е min (Г, 14) $ 


then we would find: 


ра 


a Wh 1 1 E { 
2 | уші й: > | МИ а) de. | ud[1—G(u)f as n— oo. 
0 


1-1 


After some obvious simplifications finally we obtain 
2 21 (1—2) 
Ba = АА у xi [a 3-(1— a)? de . [1--o(1)]. 2. (10) 


Evaluation of Da. i is easy to verit that . 


ү | < шах ГЕ) р. 


k 
and thus [а |< (0—1) max (| ®— |). 
Denote j pint a= | TI. |. 
Since all the 5/2 аге independent and have the same весия functions H(z) 


ва < 0—1) (нор < TE eli) 


EN П ` 


= g(g—1*EZi = 99-1) BG. — He)” = d uc = И 


Непсе 


Ку та)-0(%). ; T) 
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4. CasE (i): X AND Y ARE BIVARIATE NORMAL 
Let a bivariate random variable have the density function 


aN d... E: 1 (z—m,)? _ 2p(y—mj)(y—m;) = 1 
е 270,05 /1—p? ыз i 1—p* | 0? А 0,9. 3 І ! ^ 
1 —m.)2 
Denote p(x) = Von pA ех xp {— + em) 


ATEM 2 S 1 А оар 2 
уе Yum -> pel ЫЕ (у-т 212% "ӘР 


Further, we shall keep aj, ..., 25, and z[',.. , thy fixed with 


<... < ) 
= 2%, 
<: OR NN 


Then the conditional density рн TOT ey ЛА» ағала ris Ж. У 


will be of the form (4)! pei, Yi) ps, 8). Ре» Ум) = 21)... рур |а) and 
| (ng) ! (а1) р(@})... plang) ич. миле 
P | xy")... руле |20) respectively. 
Thus we have two series of independent variables. For the sake of shortness 


we shall denote them by the former letters Yi and Y; — 1, 
sense in mind. 


Evaluation of Ea. Ав before 
lai] < DE 19—91 = max Zy, 


„ng with their new 


|Ва| € (g—1)E max Ж. ws (18) 
All the Z; are independent; let Н.(г) be the distribution function of Z,. Then 
AMET 2 
Е шах 2; = È | z IL Hy) ан) < È [ анк) 


ААА 2 
< = v Ганца) < E v Ey — 3. es (14) 
= 
Let us evaluate the last expected value. 


Е. (9: —99* = Р, (G,—H) + Bee (9—90). 
Tt is easy to see thai 


ўе ( m+ e (2;--т,), а ) 


PR рг) 
and GEN (т. TP т), BVI ) 
1 
1 іп » 1 in 
where t= az. = T ә. 
Ы je(i-1)n41 ^ jed-ipa 
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Therefore 


MES 203(1—p?) - EE = 
ее 
т о? 
and the unconditional expected value is 


Ы —p 252 
К-т = ыш E = D(z,—m). ARI 
1 


Let E' denote the conditional expected value when 
"Xx aX auk. S а, 
аге fixed. Тһе conditional density will be 


N! pr iyi)... platy) A 


{+1 ) 
Г вай 
%%-1% 
Тһеп 
іп 2 
"( м ют) 
(Dna 


in 2 in 

! " е 

= а [ ( > ет) [| senes; 
(14 qa)da: ) 4445... лр, SD Ginya 


Vi-1)n 


ж 
inl 


roe m 2 ey 
= | -f ( > в) ll D(x; dar, 


x 

2% Б . {ты (іт 
a)da: 2-1)" 94-1» 

(7 p(x) 


%4-10т С 
iy 
pam J, (2--ті)р(еу4ш 712 
zi REN | 2 (ш—ту)?р(ж)йж--(##—т) 26-1. 
Zine я Vins 
T E 17 ЕГІСТІ 
9-1) 3 -1)n 
And similarly Ка | 
іп f^ қ i i p 
ғ( 2u (i) КТС | (2—т.)р(а)ах. 
+. ы " 
(DAE zo - < SJ- -plede ла Morbo 
ЖҮЛ 5 


149 


SANKHYÀ: THE INDIAN JOURNAL OF STATISTICS : SERIES А 
The conditional variance will be equal to 


inst 2 Vingl-- 
2 Г” турада [^ Gr тора 
j ic ec сз: NUNC Vom (PLC cups UEM Б 
D ( 02 (ж; ~) т А ДЕ ^g. 2а 
NOE Г poda | ree 
T - In ?(-1n 
or 
В . 1 
РЭ Tinkl y 
| Т“ (22) pede рі CL мшш! | 
in 2-1) о 94-1) 1 
D' ( 92 (0) —m4) )= от + - e = 
«ту nga Ving 
І p(a)da f p(a)da: 
"-Yn 2. %%-1) 
(16) 
where the expression іп braces does not exceed 
SN ans T 
“Б. : 
(7 ына)" ys.) 
Tli- 
(16) and (17) imply that 
in : d (i-Dn 
D( V jm) ) «оф 99" | Пре de; 
God SY Sree о 
inet п-2 
/ ( [s sous) Il pa?) da* 
\ ^ 3-1 us 
! —2)! 
< сїт.2 e = < const g*oin. 15: (18) 
From (13), (14), (15)-and (18) we have Ha = О [mes ). vase (19) 


E À : 
Evaluation of D(a). We have 


ERUS 
a? < (g—1) x a; < (0—1) (max Z)? 


ccs д) = 5 Га bi нцданца) < 5{ 2a (2) = É ВИ. 
va E y = ісі 
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Thus — | Юк СЭ (2) 


and the first hypothesis of Mahalanobis (1959) is true. 


5. PmoBLEM II 


Let us mix together the two samples we spoke of at the beginning of this note 
to form a combined sample and again rank the sample-units in the order of ascending 
values of x. Divide the new series of y's into g groups of equal number 2n аз we did 
before and construct the funetion 


МЕ (0 Zale i YEZ a), i-1<2<i, i=1,2,...,(y—1), 
2, (1,9 0877) being the mean value of y's in the i-th group. 

The second hypothesis of Р. С. Mahalanobis says that the graph M “would 
tend statistically to lie more and more within the area bounded by the two graphs” 
Land ZI’. To confirm this hypothesis to be true in the case of normal correlation 
between X and Y we shall show that the expected length of the projection of the part 
of this graph which lies outside the area bounded by the graphs Zand L’ tends to 
zero as n—poo for constant g. For simplicity we suppose g = 2 and first consider the 
case of independent X and У, 


a mo, 
Хи X gt. 
5--1 2n—k+1 


= (ЕЗ an—k Lu 19,2 
: Z = —(Xy УЕ 
Let XeN(0, 1), YeN(0, 1) and Z, as a F à Yi”), Za pn. 


(The sense is clear). Such a method of constructing Z,, and 2, is equivalent to 


а < eh, Сард 
U= or 4 


te “ м” 
(аз, Сар За 


the event 


р Р (Жш 
It is easy to verify, that Р(А)- Qc 
n 


iat 9 дЫ 
Let = t* Бе fixed. 
TEN. ec qus 


We are interested in the mean length of the projection (inside the interval 
[0, 1]) of the part of the line у = (1—2)Z,--«Z, which lies outside the area a. 


Ш-ЕВ(Це)<2%2 [mj 
k=0 p 


as in any case 0<1<1 
| ваз ара») < Р(4,) 
Ar 
n—Jninn ” 
and weget ШЗ? >| РАЙ? > | ишаара). "ECT 
k=0 k=n—Jnlnnt+1 Ар 
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n— Jnlnn 25 Jninn i ЖГ, 
Further, we have У РАЮ < Fay ( 50b. 2j ; 
4n 0 


2—0 
һ-/ліпт =y 2n: 
1 1 E 
03, т SS di 
x z КЕЙ 1А 
n 12/2 1 
< A | e | dt=O (exp —{(\/21пл)?) = ( ) 
М?л v Sion 
n- ynin 1 À 
Thus БУ Рр = 0 ( а) 0 (22 
10 
Let п— Мп 97 
Ху" 
and Z = ЕЯ xe nil Arh nm 
2n 2n 2 
SAU AM S. 
Z= В ia^ rn^ Ш, 
2 2n 2n 2 
(Inn)! 


Note that deN ( 0,0 ). (The letter С, the same for all cases, will denote 


2214 
various constants inessential for us). 


If XeN (o. с 1) and e > 0is a small number then 
т 


Р (> s, XI) 


созо 1 1 1 )-? (cm ep npe) 


|y |'** 
п3/4 ms nato 8-27 


Inn ҮЛ Jn: РЕ 
- P(c > prs, inp < 66) + 
RISO 


nusen 1—4 
16 
т 


+2(° g dn) НЕГЕ тн ote) 


(м <o mayne) (оба ІІ > ) 


we ^ 16 


1 
(Ing 8*9 (Inn)¥8 
< п@-їёюлвй+ә + ^ “yates E|£|- 


Here Е, yeN(0, 1). Thus we have the following lemma. 
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Lemma: Under the conditions mentioned above we have 
ii. (Inn)* .. (23 
Р (101 1) о o( "m m 


for the proper positive constants a and b. 
Now let us investigate E(l|a*) for п—\/ тіпт < k <n. 


Let W= non and W,— в. 
According to our lemma 
P(|3|>| Ml, or |а|, or 18| > zd —W,|** or 3] > [Ww] ) 
(Inn)? 24 
лт m 
Е(Цаж) = . | dF (1) i | тағ!) 
lal > || lèl < |W: 
ог |8| > | Wal І8| < 11 
ог ПРА 18| Р-Р 
er 18] >4|йї—›| 18] < a|Wi—W;] 
(ш) тат 2. (25) 
Г м ( 
181 < [Wi] 
18] < |9:| 
18] Дт 
< и: т’; | 
When |ô| < |Wi|, |8| < W, we have two points of intersection whose abscissae d, 
1-6 m W,+6 
and z, are given by 2, = СР, СЫ and 2, = - Wie 
Sats 20(W;--W,) 
so that tı, = Woe 
ЗИ. | 
Therefore we have l= |z,—z,| < С а А 
1 П 


< nis ]"W,— We . 
It is not difficult to verify that W,+W, and W,—W, are independent. Consider 


za 1 
и = yn (+) and v= yn (W,—W,) then и, veN(0, 1) and 1 < C uus Іш: 


( EE The random variable u/v has Cauchy distribution with the density 
v 


and hence Z шт exists for any e, 0 < € < 1, 
enc 5 


1 
mlz?) 
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Now let us choose A>1 in such a way that (1—є)А = 1—6,,0 < в, < 1. 


According to Holder’s inequality for А+ : 


x (1er 977") [а 


ға 
The existence of E | w|^—! is trivial. Thus 


| МЕЧ) < Са for n > М. ... (26) 
< | Wal 
lèl < I Wal i 
Jal < ia Li pn 
[8| € 11W;—W;l 
Thus from (25) and (26) we have (12%) = О ( e for all z*. .. (27) 
Cn P 1 е 
P(A,) = [СЁ — ^ - (28 
тах (A) Ce al ) 1 (28) 
| n 
ex] Уо [камри = x. Erro 
i k=n— Jninn+1 Ak 
then х<0 (ЖАР ) (max Pi} vinn ... (30) 
к 
Thus (27), (28) and (30) show E — 0 (>r узо. Е 


(21), (22) and (31) prove our assertion. 


Now it is clear how to paraphrase the above proof for the normal correlation 
between X and Y. In this case we would have 


вех (£ ( Saf b 4),6 бер) 


2n \ nti que 


and the Lemma is true for X = W,—W,. Then though the density function for 

“ы- BL А 

> will not be of Cauchy type, it is not difficult to prove that Æ | ^ |1-* still exists. 
т 


> 
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SOME REMARKS CONCERNING THE EXPECTATION OF 
THE ERROR AREA IN FRACTILE ANALYSIS 


By YUKIYOSI KAWADA 
University of Tokyo 


SUMMARY, Expressions for the expectation of the error area in Fractile Graphical Analysis 
with samples from a bivariate normal population, are found in small samples. An approximation for 
large samples is given. The empirical results obtained at tho Indian Statistical Institute compare 
favourably with the theoretical results. 


Let Z be a normal random variable with mean m and variance 02, 
Let us denote the expectation of |Z| by 
+ 


E(|Z]) = F(m, o°). Таир) 


In сазе m = 0 we have F(o, 0?) = NES в. 


If o? is a constant but m is a random variable with mean 0 and variance 7? then the 
expectation of (т, с?) can be expressed by 


(2—9) із 


Е(Е(т, 0*)) = M | Î Il ra d Ур 


where , Fy (2) - 1+5-1. E) 


Now let Z,, 7 be independent normal random variables with means 704, 70 
respectively and with the same variance 07. 


Consider the statistic 


2, Z| 
a = A(Zy, 2) „71 ©. ^ 
Е» Ар (4) 
|| if ab>0 
A(a, 6) = 
where | Ауа, 
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Let us denote the expectation of a by 


[9 


Е(а) = G(m;, ms, a°). ... (5) 


Tn сазе m, = m, = 0 we have ` 640, 0, 0%) = 2 ат 


° 
where 6 = 175; {J 


© 


[+( Г ГИ g C P. aude | 

УС CESON 

(ДӘГЕ 2)) = 0.1882. ... (6) 
#( 1- 75 log (1-+-v2)) 


If a? is a constant but (пн, ть) is a two dimensional normal random variable with means 
(0, 0), variances (72,73) and correlation coefficient р then the expectation of 
G(m,, ть, 0?) сап be expressed by 


© о 0 o о 
HG meo cer | [ИГТ 


x, 210] e-ga (ечеи) — 


EIU (- pue 
ШЕГІ Муй 


ЕЗ +] dudvdady 
ЖУ „— зр (HHn dedy 


wae А = (ото) ре 


By means of polar coordinates we have also 


Е(б(т,, ть, о®)) = a | | віп 0 сов 0 


where 0) = (0°--т8) сов? 0--2ртүг, sin 0 cos 0--(0°2-1-7%) sin?6. 
Let us put 


Tic dz ті-ті ке __2ртүт;_ 
зай 202-173-473 


(7) 


If $ (¢ = 1,2) aro small we can expand flay? = y! (14-е sin 20-4-6 cos 29) * 


ыл. 
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in power series in є and д and then we have 


B(G(m,, т, 02)) = 2 1--3 „у 1боу-1 1 
(біт, mj e) = А (а, 15 e IPIS ety ess. 
Substituting А and y by A = у(1—%—8°), 


i-( Lege ( 7 1) 447, (= )) x (12 E SNB we have finally, 
Вт ma 02) = 2 age (1а (2) (3) 
x (1-25 є+ төсі o_o...) 


2 
If we neglect the higher order terms in * (i = 1, 2) then we have 


Bien, ть 0) = Ner (pene s (8) 

Let (zy) t= 1L... (к yj) i = 1, ..., п be 2n independent observations 
on the same two dimensional normal variable with means, variances and correlation 
coefficient и, v, 0%, 03 and p. Rearrange the first sample as follows 


(ға), Уа)» <> (бор Ут) 80 that жуу 24) <... S toy 


Let n = mg where m and g are integers. Define 


1 . 
= $ — 1, ..., 
VA " dd E Мп ; 9 
1 іт 
i= — > 2); i= 1,...,0, 
M теф—1ут+1 


7; and % are defined similarly. Let us denote й;—ў; by 2; and e. (2%,--%) by 20; 
т 
for і--1,..,0- 


Plotting the points 9, .... 7; at the equidistant points 1,...,g and joining 
them successively we get a curve @ known as the fractile graph of the first sample. 
Tho fractile graph G* of the second sample is likewise obtained. Тһе area A between 
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the graphs 6 and G* between the coordinates at 1 апа 9. is defined as the error area 
between the two samples. We easily construct the algebraic expression for the 
error area A as 


A= анаа And eel}. +. 0) 
ізі [zl + lzi] | 
Төб(т,...,2,, 241,..., 25) be fixed. Then we see that z,...,2, are independently | 
ҮЛ ЖРА 
normally distributed with the same variance 3су1—р°) and means wj ..., Wg 
m 


Applying (1) and (5) we get the conditional expectation 
H(A | ay, ..., и, € +++) Жо). 


Hence the unconditional expectation is obtained as 
-i 
E(A) — (4 EE lw 0)--3 Е(Е(ш, 08) E(G( ш, о%))}. o (10) 
i=1 


Let us consider the case where m is large. Then we can consider Wy, ..., Wy 
as normal variates with means (0, ..., 0). Let variances and correlation coefficients 
of W, ..., шу be denoted by т? (i = 1, ..., g) and pij (i, j = 1, ..., 9). Assuming 


2 
E i = 1, ..., g) small we can apply (2) and (8), and we get 


—1 2 crit. En 
WA) Us оу {Пааво с} o an 


where 0 = 0.1882 as before and 


l 4 Ti 
BUE Ean(t)a-a-h Og i а SL 


(12) 


o= S peni Tin , 
9—1 ia 2074-12-72, 


By the well known results of Е. Mosteller (zu) -o Viae eds 


approximately normally distributed with means /,--0,/4, variances сё МА) ала 


; UM nfo(u,)* 
: I-A, 
covariances 0% жаа) (ғ << в) 
where мелі Мә = dy fle) = ж ey 


м?т 
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Then we have тз 20 Т Мр 
i n d^ gi 4(1-/?) 
wolf X мм, мам 1 


кл, Ли, Р) ) foe.) 


c ЫН Hs нк TNT 


1 
Ponka | M azim Jolt) 


Tre Gom Јо(и,) 


АНТ ЕЕЕ SM АЛ: 
апа F,(% (= ү ме nd fora — 9,0-1. 
2р ата 0000 Эрын 86! for і--2,...,40-2. .. (18) 


2о%--(тї-Ет л) — 2g(1—p*) HH k) 


Moreover, if g is also sufficiently large 75 24 a(i = 1,..., g) are certainly small and we can 
apply (8) as we have assumed We We may also approximate the values k; 


and pi, Ъу 


ШІЛ A) КАШ 1 E a aw 
Fou A 9 3g ГМ ге = 


АФ (ал) 
Рота Ki бұд = шау ич) ... (14) 


Though it is not correct we may apply these formulae also for hy, kj, pj, and руу 
in В апа € without effecting so much in the result since g is sufficiently large. 

By (11), (12), (13) and (14) we can compute the value of H(A). 

Consider the empirical results obtained at the Indian Statistical Institute, 
(Mahalanobis, 1958). Samples were taken from a bivariate normal population with vari- 
ances and correlation coefficient equal to 1, 4/2 and 2: respectively. The empirical 
result shows e.g., 

Vm H(A) = 1.0940.04 for g= 16,m = 12 
А = 1.074-0.05 for g = 16, т K 24. 
The agreement between the figures is fairly good. 


For the case g = 16 we find, from (11)—(14), that В = 0.068, С = 0.092 and w 


Е(А) = 1.00. 
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It is suggested that a modified statistic namely 3|z |+ |z.|+...+ |z,4|+4]z,| 
may be used instead of the area. The terms involving the function G will be 
eliminated in the analysis. 
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MINIMAX TESTS FOR THE RATE OF A POISSON PROCESS 
AND THE BIAS RATE OF A NORMAL PROCESS* 


г Ву J. V. BREAKWELL 
Lockheed Aircraft Corporation, California, USA 


minimax procedure is explained. The problem of distinguishing between two normal processes, with 
the same diffusion rate but different bias rates, is treated as a limiting case of the Poisson problem, 
The minimax solutions are also given for the extended problems in which a Poisson rate, or anormal ` 
process bias rate, can have any value. It is desired to decide whether this rate is above or below somo 
critical rate, the loss due to a wrong decision being proportionel to the difference between the actual 
rate and the critical rate. In some Poisson situations the minimax procedure is a mixed strategy, 


1. INTRODUOTION 


The author’s interest in the Poisson and normal processes was first aroused by 
the possibility of using them in approximate descriptions of the Sequential testing 
of a binomial parameter. The Poisson process is to be used when this parameter 
is small. They have, of course, a great number of other applications. It seems 
worthwhile, therefore, to solve a basic statistical-economie problem connected with 
them, namely, the determination of the minimax procedure for distinguishing between 
two process rates when the cost of observation is proportional to the time of observation 
of the process, and when the losses due to accepting the wrong rate are known. These 
solutions are described in Sections 2, 3, and 4. In the case of the normal process 
(Wiener process), the diffusion rate is assumed known, while the bias rate is in question. 
This paper also contains (Sections 5 and 6) the solutions of the important 
extensions of the above problems in which the process rate can now have any value 
(it being desired to decide whether this rate is above or below some critical rate) and 
in which the loss due to a wrong decision is proportional to the difference between 
the actual rate and the critical rate. The constants of proportionality in the loss 
function above and below the critical rate are not assumed to be the same. "This form 
of loss function was originated by Wald (1950, p.9). This lack of economie 
symmetry allows for introduction of the right amount of bias in the test procedure 
when one type of wrong decision is economically more serious than the other. Тһе 
solution of the extended problem for a Wiener process was actually obtained and used 
by the author sometime before solving the simpler dichotomous eb (Breakwell, 
1954). This solution, in the case of economic pay, has been obtained indepen- 
dently by Moriguti (1955) and by Morris H. De Groot in a doctoral dissertation, 


ЖА preliminary account of most of the results in this paper was given in a contributed paper entitled 
“Minima: тесе for the Parameter of a Poisson Process,” presented at the Western Regional Meeting of 
inimax 


the І.М.8, in Berkeley, July 1955. 
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The solution of the dichotomous problem for the Poisson process is a rather 
straightforward application, adapted to continuous observation, of the method 
presented in Chapter 10 of Theory of Games and Statistical Decisions by Girshick 
and Blackwell (1954), The method first obtains (Section 2) a Bayes’ family of Sequen- 
tial-Probability-Ratio tests (SPR tests) corresponding to various a priori weightings 
between the two possible Poisson ‘rates, and then selects (Section 3) the minimax 
procedure by looking at fhe economically least favourable a priori weighting. The 
determination of the SPR tests, based оп the notion of а posteriori risk (Wald, 1950; 
Girshick and Blackwell, 1954) requires first the solution of a problem about a “Poisson 
random walk with absorbing barriers" (see Girshick and Blackwell, 1954; p. 269) to 
obtain the operating characteristic (OC) and average observation time (AOT) for 
. a general SPR test. These have already been obtained by Burman (1946) and by 
Dvoretsky, Kiefer and Wolfowitz (1953). 

The solution of the dichotomous problem for the normal process may be 
obtained by the same methods. Тһе solution of the corresponding random walk 
problem is considerably easier (using methods used by Wald (1947)) and is given 
by Dvoretsky, Kiefer and Wolfowitz (1953) in which the diffusion rate is taken 
as unity. Their formula for ET, however, needs clarification. The subsequent deter- 
mination of the Bayes' family of SPR tests in terms of the economic losses is, however, 
more tedious than in the Poisson problem. Тһе author of this paper has preferred, 
therefore, to treat the normal process somewhat heuristically as a limiting case of the 
Poisson process and to derive the solution of the dichotomous normal problem from 
that of the Poisson problem.  ' 

The minimax solution of the dichotomous Poisson problem, unlike that of 
the normal problem, can lead to a mixed test procedure, and this carries over to a 
certain economie region for the extended problem. 


The minimax solution of the extended Poisson problem has been applied 
by the author to the problem of testing for a low fraction of defectives (1956); except 
that, in economie situations where the minimax strategy is mixed, the author preferred 
to recommend a “minimax pure strategy," selected from the set of all possible SPR 
tests. This set was augmented by the possible outright rejection of the hypothesis 
that the Poisson rate is below critical. 


9. THE BAYES’ PROBLEM FOR DISTINGUISHING TWO POISSON PROCESSES 


First of all, we may describe a Poisson process with rate A by saying that a 
certain occurrence has probability AA£--o(At) of occurring during any small time 
interval At, regardless of how. many times it has previously occurred. The number 
N of occurrences during any time interval thus satisfies the Poisson formula : 


— (at) 5 
He rang = 0-е оо 
We wish to observe this process and to test the hypothesis H, : А = A, against the 
alternative hypothesis Н, :А = À, > Ау. Let us assume that the loss due to accepting 
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Н, when Н, is true is a known positive quantity Ws; and that the loss due to accepting 
H, when H, is true is another known positive quantity W,,; and finally, that the cost 
of observation is с per unit of time. 

If we are given an а priori weighting 7 : 1—7 of H, : Hs, the Bayes’ procedure, 
say A,, is characterized among all test procedures A as the one which will minimize 
the expected risk : ; 

ty, A) = ть, (2.2) 
where rı = WysP(2|1)2-ci(1) ) 
(2.3) 
т» = Wa P(1|2)-4-ci(2), 
and where P(j|i) denotes the probability of accepting Н, when Н; is true, while i(i) 
denotes the average observation time (AOT) when Н; is true. 

Now it is known generally [Blackwell and Girshick (1954); Wald (1950)] 
that the Bayes’ procedure A, may be characterized as follows : 

Starting with a priori weighting 7 in favour of H,, compute continually the 
а posteriori weighting (6) after time t, and 

accept H, if ņ(t) > a certain value 7, greater than zero and less than one; 
accept А, if 7(/) < a certain value 7, greater than zero and not exceeding 7,; 
continue to observe if 2, < 7() < 7. ... (2.4) 


In particular, if the original 7 >> 71, this procedure requires us to accept H, outright, 
while if 7 < 7; we are to accept H, outright. The equality signs in (2.4) are optional. 
Now, because of the convexity of the risk (у, A,) as a function of у, it follows 
that уу and », are determined uniquely by A,,A, and the economic quantities Wiss 
Из, с. The actual determination will be described later. In particular, it may 

happen that 7, = 7, and that the Bayes’ procedure is always an outright decision. 
Next, the a posteriori weighting y(t) in favour of H, is expressible in the form 

АУ 7 

a(t) = DESTINO AO] (2.5) 
where L(t) is the likelihood ratio after time ¢ for distinguishing hypothesis H, from 
hypothesis H,. The Bayes’ procedure А), when y, << 7 < M, is thus an SPR test 


describable as follows : 


) (1—71) 
есері H, if and when L(t) < Am 
X 7(1--%) 
accept Я, if and when L(t) > пе" VERD 
; 2(1—174) 7(1-%) 
continue to observe as long аз Пл < L) < Tiene 


The unsymmetrical placing of the equality signs in (2.6) is optional in the general 
characterization but is adopted here for convenience in what follows. 
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Next, the likelihood ratio L(t) for distinguishing the Poisson process rate 
A, from the rate A, is : 


L(t) = | ет) E [Lus S NE 5229,7) 


Tt follows that the SPR test may now be characterized as follows : 
(See Figure 1.) 


accept Н, if and when y(t) > z; 


accept H, if and when y(t) < 0; (2.8) 
continue to observe as long ав 0 < y(t) < 2; 
where y(t) = y+ut—N, J 
1 
А-А 
and where и = 9 1 
E T Tals) 
E À t ү 
= In 1—1). | Аз \ ... (2.9) 
^ (1--7)7 | А: ) 
z — In =e) . „| А 
(1--71)7 | Ay 


J 


Number of 
Occurrences 


The determination of 7, and Mz in terms of A;, W;;, с, will be based on the facts (which 
follows from the continuity of the Bayes’ risk (7, A,) as a function of у) that whenever 
7/0) = M, the statistician does equally well by accepting H, outright or by proceeding 
with the appropriate SPR test, and that whenever a(t) = 7» the statistician does 
equally well by accepting H, outright or by proceeding with the appropriate SPR 
test. In other words, the placing of the equality signs in (2.4) and (2.6) is not crucial, 
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Now, it is apparent from Figure 1 that during any infinitesimal time interval 
At, the distance y(t) from the upper boundary will either increase by an amount 
HAE (this with probability 1—A;At--o(At)) or will decrease by 1--o(At) (this with pro- 
bability A; At) where i = 1 or 2, according as Н; or H, is true. Tt follows that the 
expected further risk, if (4) = "апа if the statistician proceeds with the appropriate 


SPR test so that y —2, is 
АН {1 --4140)0-- АНУ „Р(2 |1, 2—1) 4-01, #—1)]}-+- 

Нл Way EAM, P |, 2—1 )--ei(2, — 1] J-o(At), 
where we have written P( jli, y) and i(i, y) to indicate the dependence of P( jli) and 
61) on y, but have not indicated their dependence on z. Equating this to the expected 
risk (1—0), [Blackwell and Girshick, (1954), р. 269] corresponding to outright 
acceptance of H,, dividing by At, and letting A> 0, we obtain 

САИ 5Р2 |1, 2—1)-4-ci(1, 2—1)] 
—(1—7)A[WaP(22, 2—1)—ci(2, 2—1)]. 


Furthermore, the expected further risk, if 100) = 7, and if the statistician proceeds 
with the appropriate SPR test so that y = 0, is 


cAt+-92{(1—A, Al) Р (2|1, wAt)+ci(1, ИА) ЕЛА} + 
51-2 (1-4,АППҒ, P |2, HAt)+-c#(2, >А01--2,44 .0)--о(44). 
Equating this to the expected risk 7,W4., corresponding to outright acceptance of 
Н», and letting Л/-э0, we obtain 
al WyeP(1|1, 0) —ei(1, 0)] = (1~M2)[WaP(1|2, 0)2-ci(2, 0)]. е» (2.11) 


Note that according to our convention on the equalities in (2.6) and (2.8), the SPR 
test starting with y = 0 is not trivial, being quite different from outright acceptance 
of H,. 


(2.10) 


The OC function is known from Burman (1946) and Dvoretsky, Kiefer and 
Wolfowitz (1953). Thus : 


Ai 
| i Ay 2) si 
1—P(2|i, y) = P(1|i, у) = — у (2.19) 
Ff, 4; 
Л 
where Fly, a) = ee*(—ze-*, y), .. (2.18) 


and е*(а, у) denotes the “Tapered Exponential Function" [see Hammersley (1953)] 
defined by 


e*(a, у) = ы peuo, wes (2.14) 


[y] denoting the largest integer not exceeding y. 
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The AOT function is also obtained in Burman (1946) and Dvoretsky, Kiefer 
and Wolfowitz (1953). It is: 


А; 1 
енд (қы А: А: | 5 isi (9: 
ТИ БЕЛ RI ait) —@ (и)! (2.15) 
( # J 
where Gy, х) -£ (F(y—i, z)—1]. 1202,10) 


Insertion of (2.12) and (2.15) into (2.10) and (2.11), taking into account that А 2 a) 


: АТ MS i Н ЭУ A А) _ А: 
апа F(z, 4) [; = 1,2] аге inereasing functions ofz as is Ё (2, =) ( 1-45) 


al 2, i), leads, after some manipulation, to the main result of this Section : 
1f (баса Жы ру” 1 (acuta 196 1} S 
c c 


then : (a) There are unique numbers 2, 75, and (< 71) obtainable (after first solving 
(2.9) for и) from the following three equations : 


(еты 1. А) @(z,%*)— F(z, Аа 


ш 
(O Was (1) (20) (60) 2l (2.17) 
ЕЕ 


subject to the restriction that both brackets { } of 4the left member be positive; 


AgWa +e G (5-1, А) —с 


1-т UMass в(«—15) URN 2. (2.18) 
im (аум Metti p) au 


(b) If т» <7 < t, the Bayes’ procedure A = А, for distinguishing between rates 

A, and A,( > Ау), given an a priori weighting 7 : 1—7 in favour of À, is the SPR test 

defined by the parameters џ, z, and y. Неге 2 is obtained from (a) and y from (2.9). 

For values of 7 outside the interval (7, 71), of course, the Bayes’ procedure A, is an 
. outright decision according to (2.4). 
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The solution of the transcendental equation (2.17) for z may be facilitated by 
a short table of the tapered exponential function given in Hammersley (1953). 


Tf, on the other hand, (Cx) - 1) [O EE 1j « 1, an outright 


decision is always preferable, according to the scheme: 

accept H, when у < 70; accept H, when 7 < t; 3 

where j,isgiven by: aW, = (1-7) Ға, 2. 1220) 
the correct decision being optional when 1 =. 

Tt should be noted that if 


ШЕЛ -4) [АЯ En SUE 
wads 
Б (Wai) (а +1) < ш (% ШЕШЕ) 


then the parameter 2 lies in the interval: 0 <z < 1, and the SPR test becomes a 
“limited sequential test”, more simply described in terms of a single parameter : 


И s. (2.21) 


ав follows : 


. test for a time № not exceeding 7; 
accept Н» if and when anything occurs ; 
accept Н, otherwise. 


The acceptance probabilities and average observation time are simply : 


$ А 
1—P(2|i) = РАЙ 2e *, (2.22) 
апа 

с = 

е cy ... (2.93) 

Ài 

In place of the test parameter y, which increased from 0 бо 2 as 7) increased from 7» 
to Mı; we now have the test parameter T, which decreases from 7’, to 0 where T, qe 


and is given by the transcendental equation : 


21» 
{ Wa q^ a) ФР 0) = (АР to) Wa + (е =), 5. (2.24) 
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3, is now obtained from: 
Mm Wae ... (2.25) 
1—7 МИ, 
while 7, is obtained from : 
W., emot 
EK E ae o 2. (2.26) 
up Woy ШАРҒА а”) 


Finally, the test parameter 7! is determined as a function of у over the interval (7, 11) 
by means of: 


(1—1). КО (9,97) 


Formulae (2.21) through (2.27) are applicable to the case іп which A, = 0, provided 
that the right members of (2.23) and (2.26) and the left member of (2.24) are allowed 
to assume their obvious limiting forms. Indeed, when A, = 0, if an SPR test is to 
be a Bayes’ procedure, it must necessarily be a “limited sequential test” as may easily 
be demonstrated by a separate treatment of the likelihood ratio L(t). 


3. THE MINIMAX PROCEDURE FOR DISTINGUISHING TWO POISSON PROCESSES 


Let us suppose first that { (Жш —1 М (=н —1 } > 1. Then 


the minimized Bayes’ risk r(7, A,) = min r(y, A), obtained in Section 2 over the 
y-interval (72, 71), has a slope given by: 


4 < 
ds "(), Ay) = т. 5. (8.1) 


This follows from the fact that r(y, A), restricted to test procedures A which are SPR 
tests, must be stationary at A, with respect to the test parameters д, 2 and y, only 
the last of which is a function of 7. 

Furthermore, the difference 7—7; is a monotonically decreasing function 
of 7. 


For values of у outside the interval (7,, 71) we have : 
for 9 > т, r, An) = (1—0) Ға: for 7 < 7, т, Ay) = Ри». .. (3.2) 


But y(7,—0) -гапат,(и,2,2)-"(и,2,2)- — Ра. On the other hand, ();:-0) = 0, 
but ri(#, 2, 0)—r4(5, 2, 0) < Wy. Thus the graph of r(7, A,) versus у over the whole 
interval (0,1) has everywhere a non-increasing slope which is discontinuous at 
3 Тә қ $ 

This graph thus takes one of the general forms in Figures 2 and 3 according 
аз d/dy r(y, A,), evaluated at 7 = 734-0, is or is not non-negative. Because of (3.1) 
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с ; 3 E Н 
we see that Figure 2 is typical if 71002, 0) > rq, 2,0); otherwise Figure 3 is 
typical. 


ухо 


Figure 2. Figure 3. 

The SPR tests described in Section 2 have been selected in the light not 
only of assumed known economie losses Wy, and W,, and an assumed known test 
cost ¢ per unit time, but also of an assumed known a priori weighting 7: 1—5) between 
a Poisson rate A, and a rate As. 

Tf, instead, 7 is not known, a “safe” course is to look for the minimax proce- 
dure (relative to the two rates A; and А), i.e., a procedure A = A* which will min i- 
mize max (r(A), 7,(A)). | 

In situations typified by Figure 2, the minimax test procedure is simply 
the SPR test which corresponds to the abscissa 7* of the highest point on the graph, 
which test is defined by a test parameter y = y* (in addition to the test parameters 
u and г previously defined) given by the further transcendental equation : 

"(2,0% = тыд, 2, у%) = r*, вау. ... (83) 
Indeed, this SPR test A* compares favourably with any other conceivable test proce- 
dure A, in the sense that 
q*r(A)--0.—7* nA) > 7%(А9)4-(1-7%(АУ), 
from which it follows, because of (3.3), that max (r(A), 1(A)) >”. 

Іп those situations typified by Figure 3, the minimax procedure is a mixed 
procedure A,,* defined as follows : either accept H, outright, this with weight 2% or 
apply the SPR test defined by y = 0 (and by the previously defined апа ғ), this 
with weight 1—E*, where £* is given by : 

, Pol Hes 2, 0) — (и, 2, 0) 295 (3.4) 
таи, 2, 0) —(, 2, 0) + Wry 
which is the statistician’s solution in the following 2x 2 game between the statistician 


and “Nature”: 


Ue 


Nature 
7 1—7 
.H, Н, 
Е ee 5 га Ге ЕЕЕ) 
statistician ie SPR test 
with y= 0 | (и, 2, 0) |т.(и,2, 0) 
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The mixed strategy Л, leads, as is easily shown to a risk independent of Nature’s 
choice of Process rate А, namely: 3 


Р қ Wyor al ft, 2, 0) Р 
"(А,) = (Д) = r*, say, = "Uh 8 [yer ^. 2 0) F Wu ... (3.6) 


Nature’s best strategy, of course, is 


ФЕ reu, 2, 0) .. (3:7) 


in agreement with (2.11), which leads to a risk equal to 7% in (3.6), whether the statisti- 
cian accepts H, outright or uses the SPR test with у = o. 


We verify that the above mixed strategy is minimax by observing that no 
other strategy A, mixed or otherwise, can lead to a value for Not (A) HA —72s(A) 
lower than 5gr4(A7,) 3-(1.—39)r( A5). 


From this, because of (3.6) it follows again that 
max (r(A), "(А)) > r*. 
Suppose, finally, that 


с 


{ (2-41) Ға шү Қ Grats —1}< 1. 


Then, as pointed out at the end of Section 2, the Bayes’ procedure is always an outright 
decision. The graph of 7(7, Д,) vs 7 takes the simple form shown in Figure 4. 


Figure 4. — 


The minimax strategy A7, is here the solution of the 2x2 game, 


Nature 
fa ч 
- жал | H, | H, 
.&. | accept H, hi Was | à 0 MAE (3.8) 
1 outright | 

statistician | — - 
1—€| accept Hi |0 | Wa 

outright | | | 
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W. 
Ex — 21 «s (8.0 
Wa Sy Wi | | 
which again leads to a risk independent of nature's choice : 
М W, 2 W. 
r = (А) = (А5) Йа -- (3.10 
ias) = мА) = pp arn- om (840) 


Nature's best strategy, of course, is 7 = qj, given by (2.20), which again leads to a risk 
equal to 7% in (3.10), independent of the statistician’s choice, 


4. THE NORMAL PROCESS PROBLEM AS A LIMITING CASE 


If y and z —y are large and A, not too small, the decision to accept H, or the 
decision to accept H, will take place only after a large number of occurrences, If 
tis a typical time of decision, the quantities Aż and А, are thus large, so that the Poisson 
distribution of the number of occurrences in time ¢ may be approximated by a normal 
distribution with mean and variance Ati = 1,2). Tf, in addition, the difference : 
1№—А, | << Aj, Ag, № follows not only that the random walk in Figure 1 may be 
regarded as a “normal diffusion process” or “fundamental diffusion process, "with. 
a bias rate, [Chandrasekhar (1943), Uhlenbeck (1945)], but also that the two parameters 
A, and A, correspond to essentially the same diffusion rate. In other words, the 
problem in distinguishing two Poisson processes approaches the problem of distinguish- 
ing between two “Wiener processes” (Dvoretsky, Kiefer and Wolfowitz, 1953) with 
different bias rates but the same diffusion rate. То see this in greater detail, let the 
rates A, and A, be expressed in terms of two rates R and A,, and a number 7 (which 
will later tend to оо) as follows : 


Ay Ag = 2уЁ, А-А, = Атуу. ЖАП) 


Furthermore, let А, be another rate differing from A, or A; by a quantity of order 4/», 
and let 


peda ... (42 
mi ү (4.2) 

so that (because of (4.1)) 
m,—m, = Am, ... (43) 


as implied by. the notation. Then, as у—»оо, the distribution of the random quantity 
X(t), defined by 
xe = МА 4 (4.4) 


tends tonaca a normal distribution with variance At and mean m,t or mot, according 
as Н, or H, is true. In other words, the time series X(t) is asymptotically a normal 
(Wiener) process with variance Ё per unit time and bias m, or т, per unit time, according 


as H, or H, is true.. 
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rne | 


EE = ais TUM G у, 65) 
^ "TUE UY 0 (5 |. 
Let us next introduce test parameters Y апа 2, defined by 
Y -yvy, ) 2. (4.6) 


Z-—zWy J 


By means of (4.2), (4.4), (4.6) and the first, equation (4.5), the SPR test characterized 
in (2.8) may be re-characterized (as y— оо) as follows : (Figure 5): 


Accept H, if and when X(t) < mpyt—(Z— Y); 


(4.7) 
Accept H, if and when X(t) > тї + Y; 
where тт = fum ... (4.8) 
Figure 5. 
Next, the Laplace transforms of the functions and G of (2.13) and (2.16) are : 
e. 
1 
E EA Ee er 
{Fa ay = а (4.9) 
and 
о 
е-я Gy, с уо E RIA NT 
| (y, <)4у Werbe es. (4.10) 


ae 172 


0, 


МЕЕРИ 
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Using (4.6) and letting уэ оо, the Laplace transforms of тей ала G are easily 
y y 


inverted. From (2.12) and (2.15) we thus obtain: 


* 


P(2|1) = exp[—2(mp—m,)Y/R]—exp[—2(mr—m,)Z/R] 


1—exp[—2(m5—m4)Z]R] — (411) 
2) = exp[2(mg—my) Y /R)—1 
д. expi(m, ти) 21Е]-1 (4.12) 


апа ii) = (ИРИ T). 2. (418) 
u ETRA 


These formulas are not really new, being obtainable from Wald’s treatment of the 
discrete problem of distinguishing between the means of two normal distributions. 
First, as in Dvoretsky, Kiefer and Wolfowitz (1953), with variance now R per unit 
time, the likelihood ratio is 


L(t) = exp (os Kem}. 


Comparison with (4.7) shows that the Wald parameters A and B are given by: 
mA = YAm/R and InB=—(Z—Y)Am/R. Next, (3.43) and (3.48) of Wald (1947), 
with 0; and б, replaced by m,At and m,At, and with 0 replaced in turn by m,At and 
mg, yield (4.11) and (4.12). Finally, (4.13) is obtainable from (3.57) and (3.60) 
from Wald (1947), with c? replaced by RAt. 


As pointed out in Dvoretsky, Kiefer and Wolfowitz (1953), Wald's formulas 
are exact in the case of the normal process, since there is no longer any overflow at the 
boundaries of the random walk. 


Our only purpose in having derived these formulas here in a rather heuristic 
manner is to immediately extend our treatment to obtain the Bayes’ and minimax 
procedures for distinguishing between the bias rates of two normal processes with 
the same diffusion rate R, when the losses Wy, and W, (due to a wrong decision) are 
known and when the cost of observation is с per unit time. 


From the main result of Section 2, we now proceed to the main result of this 
Section : 
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For any positive values of W,., И, c, В, and Am, (a) there exist unique numbers 
$, 71,7» апа Z, obtainable successively from : 


{ DE ae fuc ) = sinh’, EA) 
Ur MN. и CR 
hah Nader cu NL 

CERT eto eie Mala (4.16) 
Т т 
апа ы 52 (417) 


(b) The Bayes’ procédure А, for distinguishing between the bias rates m, and ть, 
given an а priori weighting y : 1—7 in favour of ту, is as follows : 


for у > h, accept H, outright; for у < Na, accept Н, outright; 


for 72 < 7 < 1, apply the SPR test, defined by (4.7) and (4.8) or Figure 5, with Z 
determined from (a) and Y determined by : 


2 qax.) 
ү ұра Ip ... (4.18) 
2$ "(ane 
The minimized Bayes’ risk r(y, A,) again satisfies (3.1); and r(y, A), if restricted to SPR 
tests, is again stationary with respect to the SPR parameters, in this case mz, Z, Y. 


As in the case of the Poisson process, Y Z as уэ 77, from below and the test 


degenerates into outright acceptance of H, (see Figure 5 and [or formulas (4.11) 
through (4.13)). 


Furthermore, Y—0 as 7 — 7, from above, and, in contrast with the Poisson 
problem, the test again degenerates, this time into outright acceptance of H,. This 
means that the curve of r(7, A;) has a continuous slope at 7 = 75 as well as elsewhere, 
80 that Figure 3 is never applicable. 


Asin Figure 2, then, the minimax test procedure is an unmixed procedure, 
boing the SPR test corresponding to the 7% and Y* for which 


ттр, 2, Y*) = путь, 2, Y*). `` 2. (4:19) 


This, with the aid of (4.11), (4.12) and (4.13), provides an additional transcendental 
equation for the appropriate value Y* of Y. 
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5. Тнк EXTENDED MINIMAX PROBLEM FOR А NORMAL PROCESS 


Let us now turn to an important extension of the previous problem wherein 
the bias rate, say m, of a normal process can have any valué and it is desired to decide 
whether m is above or below some critical rate mọ. In other wordg we are testing the 
hypothesis H, : m < m against the hypothesis H, : m > 2%. 4. loss Ws, due to 
accepting H, when H, is true will be, in general, a monotonically: increasing function 
Т (т) of m defined over the range m > mo; while the loss Wy, due to accepting H, 
when H, is true will be, in general, a monotonically decreasing function Wis(m) of m 
defined over the range m — Hg. 

Let us first confine our attention to the three-parameter family of Wald-type 
tests defined by (4.7), the test parameters Y. ‚ 2, and my now being regarded as inde- 
pendent. Тһе risk function r is given by the two formulas (2.3), according as m { 51%, 
combined with the Wald formulas (4.11) through (4.13), with m, (now replaced by m) 
taking on all values less than то, and m, (replaced by т) taking on all values greater 
than n. 

Assuming that the loss functions Wi(m) and Wij(m) vanish at the critical 
rate m = my and increase continuously (and differentiably) but not too drastically 
as m moves away from 2% (so that the expected losses И’. (т) Р(1 |2) and Wyo(m)P(2|1) 
will actually decrease to zero again as |m—my| increases indefinitely), the typical 
risk function has the general form of the full curve shown in Figure 6, in which the 
dotted curve represents the contribution ci(m) of the cost of observation to the total 


risk r. 


4 -----Э 


Figure 6. 
The minimax choice among the three-parameter family of the Wald-type tests is that 
test which will render the maximum ordinate in Figure 6 as small as possible. 
Now we would naturally expect from Figure 5 that Р(1|2), for fixed Y and Z, would 
be an increasing function of mp while P(2|1) would be a decreasing function of mp. 
These facts may be verified from (4.11) and (4.12), We may expect, moreover, that the 
maximum risk over the range m > ту would be a continuous increasing function of 
Mp, while the maximum risk over the range m < my would be a continuous decreasing 
function of mp. Ifso, the minimax choice of test parameters У, Z, and m necessarily 


satistes the condition : 


"т; тт, 2, Y) = "ть; mp, 2, Y), (as in Figure 6) (5.1) 


where 
(т; mg, E, Y) = пт; ту, Z, Y) = max т; (m; mp, 2, Y), 
К m < ж soe. (0:2) 
(Mo; тү, E, Y) = "Дт; my, 2, Y) = max r(m; mp, 2, У), 
т > т, 
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Since r is, except at то, а continuous and differentiable function of m, У ‚ Z and my, 
the minimization of r, and г, subject to (5.1), requires the existence of a number 7 
such that; 


nri +(1—)rg i (5.3) 


is stationary with respect to the parameters ту, 2, Ү. But we have seen in Section 
4 that this is the case for SPR tests belonging to the Bayes’ procedures for distinguish- 
ing between the bias rates m, and тә (at which the risk function has its two equal 
maximums). Moreover, (5.1) is identical with (4.19), so that our necessary conditions 


(5.1) and (5.3) will both be satisfied by the minimax test for distinguishing between m, 
апа mg. 


Тһе possibility that the minimax Wald-type test would not only lead to a 
risk function with two equal maximums, but would also be the minimax test for 
distinguishing between the bias rates at these maximum points was suggested to tlie 
author by both Herman Rubin and Melvin Peisakoff. This possibility, if realized, 
has the important consequence that the test in question is minimax among all possible 
test procedures, mixed or otherwise. For it follows from Section 4 that no other 
test procedure can have a lower over-all maximum. 


Given с, m, and the loss functions (т) and (л), we are led to look 
for Wald-type test parameters тү, Zand Y and for a real number ¢ which will satisfy 
not only (5.1) and (5.2) but also (4.8), (4.14) and (4.17). Тһе test, if obtainable, will 


be minimax over the extended range of process rates among all possible test procedures. 


This minimax solution has been carried out in the case of linear loss functions : 
(т) = 4,(%-т),т< т; Ғ,(т)- А(т-т)/т>т; .. (54) 


where A, and 4, are positive constants which are not necessarily equal. Unless 
they are equal, m; is not equal to ж. Using (5.4), (4.14) provides a relation R, 


between the quantities ¢, к) ‚ апа 209) my ‚ While (5.1) using 
^ ) 


_ (2.3), (4.3), (4.8), (4.17) and (4.11 ) thro 
А(Ат) (m — т.) As(Am)*(m, —my) 
cR Э cR Г 

of (5.2) are : 


ugh (4.13), provides a relation R, between ø, Y/Z, 


Next, necessary conditions for the fulfilment 
д" (ти; Mp, ZY) _ Îr (ma; т,2, Y) 
LC Soy = 0 and РО nee zm «e (5.5) 


Which, because of (2.3), (4.3), (4.8), (4.17) and (4.11) through (4.13) provide respectively 
a relation R, between ф, T. 214)? ng A (Am)t(my—m;) 
Z cR cR 


‚ and a relation R, between 


Y :AyAmp... 4 (Amm, — : : 
$ o с and 4X та т). ‘The four relations Ry, Ra, Rs, and №, suflico 
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to determine? | 4, 440 E 
о cetermine 7. , Ж ! ep Ü =L 2), and ra ав functions of the single parameter 


$. Hence, using (4.8) and (4.17), the quantities 


: с us 
"n ү! E ... (5.6) 

ш vedi "2-р, | 

в, = pos |" (т,-ту, | 

Aay \13 
0, = ( ==] (my—m,) = (006 (К) 
аз well аз p : 
Tmax 

Or (ду (ону T 
in all of which Ау = ҚА,--4,), ... (5.8) 


are obtainable as functions of the single economic parameter : 
p = A3[A,. ... (5.9) 


Note that our formulation of the risk according to (2.3) and (5.4) involves four econcmic 
parameters: А1, 4, ту and c. The forms (5.6) and (5.7) of the minimax solution 
ав à function of the single economic parameter p of (5.9) could, however, have been 
anticipated for “dimensional” reasons because of the invariance of the problem under 
the group of scale changes, not only of time and cost but also of the scale on which the 
normal process itself is measured (the vertical scale in Figure 5). The actual 
numerical determination of the quantities (5.6) and (5.7) as functions of p required the 
assistance of an IBM CPO. These are shown in Figure 7. As could ba expeoted, 
the solution has a symmetry about p — 1, according to which the interchange of 4, 
and А, is equivalent to the interchange of Y and Z— Y, together with a change in the 
sign of m,—m;. 

This solution was given earlier by the author Breakwell (1954) inan application 
to the problem of testing for tho fraction of defective, and was used again in a slightly 
different application on p. 255 of Breakwell (1956). 


The minimax nature of the obtained solution will be established if we can prove 
that the necessary conditions (5.5) are also sufficient in combination with (4.8) and 
(4.14), to insure condition (5.2). The authof has not Sueceeded in obtaining an 
analytical proof of this except in the economically symm: tric case* A, = A,, in which 
тт = mo; but numerical computations at a large number of values of p show that (5.2) 
is, indeed, satisfied. 


* А separate account of this case has been given by Morris H. DeGroot, 
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6. THE EXTENDED MINIMAX PROBLEM FOR THE POISSON PROCESS 


We turn now to the corresponding Poisson problem. In this case we are 
testing the hypothesis H, : А < some A, against the hypothesis H,: A> Ay, the 
loss functions W, and Wa now being functions of A, functions we again suppose 
to vanish at the critical rate (in this case Ay) and to increase continuously (and 
differentiably) but not too drastically as A moves away from Ay. 

Guided by the results of Section 5, we look for a test procedure which not only 
leads to equal maximums on the risk curve at rates A, and Ag, say, respectively below 
and above the critical rate A», but which is the minimax procedure for distinguishing 
between these two particular rates. In the light of Sections 2 and 3, this means one 
of two things : either 

(1) The test procedure is an unm xed SPR test, with parameters and 2 
related to A, and A, (according to (2.9) and (2.17)), and with parameter y related to 
A, and A, (by (3.3)); or 

(2) It is a mixture in a ratio £: 1--5 between outright acceptance of Ha 
and an SPR test for which y = 0 and for which уб and 2 are still related to A, and A, 
(according to (2.9) and (2.17)). Here & is related to A, and A, by (3.4). 

For the mixed strategy, the risk is defined by : 

(Л) = | WA) +(L—2){W (AP |А, 2; 0)-+-cl(A; д, z, O)}, if A < Ag; | ) 
. 22 6.1 


(1—E){Way(A)P(L | As, 2, 0) КА; м 2, 0), ҒА < Л; 
whereas for an unmixed SPR test the risk ін: 
(A= | Т, (А)Р( [А и, 2, у) (A5 и, 2, у), ҒА < Ag 
Wa(QUPQ |; п, +, ДА: д, л, у}, ЖА > Ay J 


In either case, the test procedure, if obtainable, will be minimax over the extended 
range of Poisson rates among all possible test procedures. 


(6.2) 


This minimax solution has been carried out in the case of loss functions of the 
same linear type as that treated in the extended normal problem, namely : 


Wis) = КАА), A < Ag; } 


И’, (А) = КАЛ), A > Ao; 
where К, and К. are positive constants which are not necessarily equal. (Even 
when К, and К, are equal, the minimax choice of test slope и does not coincide with 
Ло because of the lack of symmetry of the Poisson distribution). 
‚ Relations are obtainable from Sections 2 and 3 and the locally maximum 
character of the risk function 7(A) at A, and As. which are sufficient to determine 


(6.3) 


А 
у or &, 2, № л, and hence Е апа A as functions of the two economie parameters : 
0 2 f 
р= К/К, amd %-20/К,. 2. (64) 


с 
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This general form for the minimax solution could have been anticipated for dimen- 
sional reasons because of the invariance of the problem under the group of scale changes 
of time and cost. Тһе scale of the Poisson process itself (the ordinate scale in Figure 
1), is, of course, an intrinsically unique scale. 


The numerical determination of this solution was carried out with the aid 
of an IBM 701 over the range .01 — p — 100. An unmixed minimax solution is 
obtained in this way at all “economic points" (р, д) lying above a certain boundary on 
which the minimax test has y = 0. This boundary is the heavy line in Figure 8 
and the upper dotted line in Figure 9. -Below the line the minimax solution is a mixed 
strategy with Е > 0 but у = 0. Figure 8 shows the values of 2 over the mixed region 
as well as the values of the ratio:- 


over that part of the unmixed region for which z > 1. 


As noted at the end of Section 2, those "limited" SPR tests for which z < 1 
are more simply described by a single parameter T, the upper limit to observation 
time. The quantity : 


у= Т Е ... (6.6) 


is shown іп Figure 9 over the relevant region. Above this regicn is shown the рага- 
meter 2. The lower dotted line in Figures 9 and 10 is the upper boundary of a region 
ior which the left maximum of 7(A) occurs at A, = 0 with 7'(0) < 0. 


Figure 10 shows the minimized maximum risk through the ratio : 


= "вах. ЖА (6,7) 
ә Vek, (6.7) 


and the minimax test parameter д, if 2 > 1, through the quantity : 


8- дзе), E. in» (6.8) 


As in the case of the extended normal problem, the minimax nature of the 
obtained solutions will be established if we can prove that the local maximum character 
of the risk function in the neighbourhood of A, and Ag, in combination with (2.9) and 
(2.17), is sufficient to insure that the risk function does assume its highest value аб 
А, and А». Again, this has not been proven, but numerical computations at a large 
number of economie points (р, д) indicate that this is so. 
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MINIMUM VARIANCE UNBIASED ESTIMATION IN COIN 
TOSSING PROBLEMS 


Ву T. V. NARAYANA and У. 8. SATHE 
University of Alberta and Summer Research Institute, Kingston 


SUMMARY. The use of Series expansions in estimation problems for distributions involving 
more than one parameter is illustrated іп this paper. Unbiased minimum variance estimates are derived. 
for the probabilities Pı, ps Of obtaining heads with two coins 1, 2 in two "Markov" binomial sampling 
problems. е 


1. INTRODUCTION 


We consider the problem of unbiased minimum variance (СМУ) estimation 
in binomial sampling with two ccins 1, 2 with probabilities ру, p, of obtaining heads 
respectively (p, ^ рз). Such problems have been considered by Narayama (1955), 
Narayana, Sathe and Chorneyko (1960) and the use of series expansions in UMV 
estimation problems has been discussed by DeGroot (1959), Guttman ( 1958). In 
Section 2, we consider an example of estimation in an inverse binomial sampling 
problem, which though artificial, illustrates these principles and generalises previous 
results, The generating function (g.i) for this problem is given explicitly in 
Section 3, following methods given by Feller (1957). Тһе rest of the paper deals with 
the application of similar combinatorial methods and the use cf gf's in related 
problems (Narayana and Ladouceur). 


2. ОМУ ESTIMATION IN MARKOV INVERSE BINOMIAL SAMPLING 

Consider the game №, played with two coins 1, 2 with probabilities фу, p, 

for heads (0 < p, < 1; p, 4 Po, P1 75 da = 1—1») according to the following rules: 
(i) The first trial is made with coin 1. 

(ii) Тһе n-th trial is made with coin 1 or 2 according as the (n—1)-st trial 
is tail or head (n > 1). 

(iii) We stop the sequence of trials at that trial where the total number of 
heads equals r for the first time, 7 being a positive integer. ў 

In what follows 2 and О stand for head and tail respectively. 

The game №, can end at the (n--7)-th trial (n = 0, 1, 2, ...) of a sequence of 
nO’s and rz's, the last trial being, of course, x. Consider the sequences ending 


at the (n,--r)-th trial which (a) either begin with an тапа have k—1 changes of type 
Ox от (b) begin with а О and have k changes of type Ох. Тһе probability associated 
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with such sequences is рү pj-* g%o—*+1 44-1 and therefore (n, k), by an appropriate choice 
of k for each n, can be considered a minimal sufficient statistic for the distribution. 
For given л, k varies between 1 and min (r, +1). The number of sequences of type 
(a) i.e. starting with x and having k—1 changes, Ox are in one-to-one correspondence 
with the different possible arrangements of r—22's and nO’s with k—1 runs of O's. 
The number of (a) sequences is thus 


n—1 r—1 
( ) ( ). ЕТ 
5-2 k—1 


A 
Similarly the number of sequences (b) starting with exactly v O's (v = 1, 2, ..., n—k+1) 


; ish 
n—1l—v r—l 

( ) ( ) өз 
5-2 5-1 


As the number of sequences іп (2.1) is the same as substituting у = 0 in (2.2), the 
: total number of sequences with probability рор" q?-**! дісі, is 


n-k+1 4-1-у т 
М; (k,n) = X ( Х ) 
20; 5-2 5-1 
‚—1\' т 
(E SY. Do 
k—1 k—1 


ж min (r, п--1) 
Непсе > > М, (Е, n)pt py" gtt ді 
i т=0: кеі: ; 


E 


$ шіп (7, n1) 


=; X ON: (ke; аб бї m(6,, 0.) = 22 (24) 
n=0 k=l 
LM па and m(0,, 0,) = 254 — (1—0 +00) 
Po n ct e ee E Oy 


The form (2.4) indicates that the ‘statistic % т) is complete. ‘Therefore the unique 
ОМУ estimates cf ӘР мес. >. аа 


ЖАА АУ лақ 


ES cS Е 


YN M Ж ; PETES ANE TA . -(2.5) 
Cx em AES EN INR. a 
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8. THE GENERATING FUNCTION or N;(s) 


Consider the game N, which is played with the same rules as N „ except that 
at the first trial coin 2 is used instead of coin l(cf (i), Section 2). Let №, (в), N,(8) 
denote the g.f. of these games i.e. 


2 Ф min (r, п+1) » 
N,x(s) = E А. М, (Е, турі pg* gittigi- gni2, Ra D) 


n=0 


A method similar to Section 2 gives 


ж min (r,n4-1) 7) r Суы 
Мг) = 5" ИСО! 
5-І k $ * 


7-0 5-0 
А 
where we make the convention that the coefficient ( ne ) = 1,( hi ) = 0 for all 
positive п. Setting No(s) = М, (8) = 1, we have the relations 


14) = p2sN,_1(8)+-928N (в) 


: (3.3) 
N,(8) = рв, y (8)4-g,8N, (5) J 
with Мз) = 13 and Мз) = рь per (om R4 
Using (3.3), (3.4) we see easily 
W,(s) = (N,(s)" and N;(5) = Ма) (в) e (3.5) 


а relation, which we expect to hold from the definitions of the games. Writing equa- 
tions (3.1), (3.2) as power series in s, the relations (3.3) are valid. Inversely (3.4), 
(3.5) yield in turn (3.1), (3.2) and ће ОМУ unbiased estimates for ру, pẹ. Differentia- 
tion of (3.5) using (3.4) gives the mean and variance of the number of trials required 
to terminate N, №, у 


E(N,) = 1(p1-3) ,E(N;) = (r Pte 3e 1 
i Pı P ДА 


4. GENERATING FUNCTIONS FOR RELATED PROBLEMS 


Consider the games G,, 6, and their g.f.’s G,(s), G;(s) defined analogously to 
N, N, except that the stopping rule specifies that the total number of heads exceeds 
the total number of tails by r for the first time [Narayana (1955); Narayana, Sathe 
and Chorneyko (1960)]. In order that the games G,, G; terminate with probability 1 
it is necessary and sufficient that р,-Ер» > q;+q2. We assume this condition holds. 


As G,(8) = рь89,_1(8) 980,18) 
G(s) = p38G,_1(8) +9184; 41(8). 
185 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES А 
Where G(s) = G(s) = 1. we see as before 
, G(s) = І6,(6)7, Gals) = Gs(S)IG3(5)] -- (4.1) 


G(s), G1(s) are in fact the negative roots of the equations : 


91828 —2(1--рьйі8° —р1да8®) + P28 = 0 7 (4.2) 


91822 —2{1--р1958°— pas} + P18 = 0 


Differentiating (4.2) using (4.1) we have means and variances of the number of trials 
required to !erminate @,, G;. These results generalise Narayana (1954). The expected 


values are. 
| қ i E(G,) = r(ps-a3)/(01— 43) 


"gere 


"Their connection with Feller (1957) and the replacement of the two coins in G, by 


a single coin with probabilities P! |, 4: for heads and tails is clear. The 
Pitd Pit | 
problem of obtaining unbiased estimates for ру, ps in б,, б) has been treated by 


Narayana, Sathe and Chorneyko (1960) and these results in turn imply the identity 


s+t—1\ /s+t—l ) 
14424-02 — 0—9) — Фир = 1—u—v— ES (41-1) 
ё ) 
We are investigating an independent proof of (4.3) by analytic methods. We finally 


remark that (4.3) and the use of g.f.’s simplifies similar problems in unbiased 
estimation, (cf. Narayana and Ladouceur). 
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. CONFLICTING CRITERIA OF ‘GOODNESS’ OF STATISTICS 


By J. SETHURAMAN 
Indian Statistical Institute 
SUMMARY. In this note it is intended to show by an example that if B and C are two statistics 
unbiased for a parameter @ and C has uniformly smaller variance than B, then for testing @ = 6, against 


an alternative (say 8 > 90) the test based on С need not have more power than B locally (i.e., near the 
null hypothesis). 


Basu (1953) gave an example (also quoted by Rao (1952)) to show that. there 
can exist estimates with uniformly lesser variance than maximum likelihood estimates, 
We use the same example here. 


21, Vos ..., 2, are N independent observations from a rectangular distribution 
on (0, 20), 0 > 0. We wish to test the null hypothesis 0 = 0; against the alternative 
9 > 0. Let us write 


Ё = max (21, 2$, ..., Zp), 7 = min (лү, 25, ..., 2), t x. с 


Consider the statistics N 


A = (Е, 7) or equivalently (5,4); В=Е; O= 2%+7. 


It is easy to verify that A is sufficient for 0 but not complete; B is the maximum 
likelihood estimate for 0; (n2-1)0/(5n--4) is the minimum variance unbiased linear 
estimator of Ё and у for 0, and 


n+l "m 1 " 
r( 5л-Е4 о) atenti l 

n+l if n+1 m n Ф 
2 ( 2-1 в) А ( 2-4-1 в) (n+2)(2n-+1)? 


which show that, though B is the maximum likelihood estimate, there is another 
statistic with uniformly smaller variance. 


Based on these statistics we can find (by an application of the Neyman-Pearson 
Lemma) uniformly most powerful tests A, В and С respectively for testing 0 = 0,, 


against 0 > б. The tests А, В and С of size о and their power functions 24, Рр and 


0 
fc are given below as functions of A = б” 
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Test A. Reject the null hypothesis if у = čt > m and/or 5 > 20, 


= = -9-gln 


[ Als 1% [2—Ay—e]- if1€«A&€p. | 


842) А (1) 
= 1 ifw<a ) 


. Test В. Reject the null hypothesis if č >т 


p=% = 1-+(1—шж)!" 


% 
1 n в. © 
"y = 1-0 (1—а)'/ —А' ifl<aAcy ean, (2) 


Test C. Reject the null hypothesis if 2+ > т 


= ™ — в чү үш! 
-%-6-%) 


_ [6A—64(30)"y" — ., и 
oe сиі<А6 6 | 
6—(32)^—3A] . , 
8494 — 1—1 зт М ME Ее <А №. 8) 


ll 
= 
СЕЗ 
А 
> 


The numerical values of the powers of these tests (for the size 0.05) are given 


in the following Table for some selected values of A and п.. 


The test А based on the sufficient statistic is naturally better than the other 


two tests. But it is surprising to find that test B is more powerful than test C in the 
neighbourhood of 0,. The following Table shows that test C is poorer than test B in 
the neighbourhood of the null hypothesis, even though test C catches up and exceeds 
test В іп power for higher values of A. The practical statistician may note that in 
this region of large values of A the power of both the tests is quite high that one 
would not be much the worse by using the poorer test B, 
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Differentiating the power function at A = 1 we have 
820) = "(2--а) 
n—1 


ВАШ) = n(1—a) " +1—a) ~ n(3—22) | 


п-1 
ВУП) = "(2434)" -а) ~ n. 5x J 


This clearly shows that test B is locally more powerful than test C. 
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MISCELLANEOUS 


SAMPLING EXPERIMENTS ON THE COMBINATION OF 
INDEPENDENT j?-TESTS 


By NIKHILESH BHATTACHARYA 
Indian Statistical Institute 


SUMMARY. Тһе present communication reports on some model sampling work on the relative 
powers of three methods of combining independent x?-tests. One is based on the straightforward addition 
of X?'s and of the corresponding degrees of freedom; the second is the Р) technique using upper tail pro- 

n 
babilities associated with the x?'s; and the third is Tippett's test (Birnbaum, 1954) based on the minimum 
upper tail probability, Although the experiments are in no way exhaustive, they indicate that the 
first two methods are almost equally powerful, (which has interesting implications) and that the third 
method is usually inferior to the other two. 


1. OUTLINE OF THE EXPERIMENT 

l.l. Let k denote the d.f. and A the parameter of non-centrality for a non-central 
№. For each of a number of combinations of k and A, one series of independent non-central 
X*s was built up by using Wold's table of normal deviates (1948). Тһе three methods of ' 
combination were then applied* to mutually exclusive sets of n non-central x?’s of each series, 
n. being, in turn, 2, 6, 12 or 24. Table 1 summarises the results. s 

1.2. Another type of experiments was carried out for y?'s with single d.f. Two 
series of single d.f. y?'s were taken, the two differing in respect of A, and a ‘mixed’ series was 
built up by picking up alternate elements from the two ‘pure’ series. The three methods of 
combination were then applied to mutually exclusive sets of n non-central д2 of the ‘mixed’ 
series, where n = 2, 6, 12 ог 24. Results for such ‘mixed’ series are shown in Table 2. 

1.3. Power figures given in the tables are all ‘estimates’ based on the model sampling 
work, although ‘true’ values were also calculated for the first and the third methods using 
Patnaik’s approximate rules (1949). These ‘true’ values agreed with the corresponding 
estimates to within limits of sampling error. The ‘estimates’ are, however, presented here 
instead of ‘true’ values, in the interest of making the power comparisons more sensitive. 

1.4. То save space, estimates of power are given for two particular levels of signi- 
ficance. Results for the other levels were very similar. Also, lines for n — 12 or 24 are 
omitted in case the number of experiments fall below 50. 


2. Resuurs 
2.1. As regards the relative powers of Xy? and Р, tests, Table 1 shows that these 
are almost equally efficient when the y*’s combined have the same Ё and A, where k = 158; 
12 or 24, and A assumes the common range of values. Table 2 indicates that some variation 
in A does not alter the situation if k = 1. Fork = 2, it may be recalled, the two methods 
are strictly equivalent, whether the A's are different or not. From all these, it becomes probable 
that the two methods are almost equally powerful even in the general case of combining 


X"s with varying k and A. 


*Upper tail probabilities (g) were, of course, easily found for x?'s with single degrees of freedom. 
For higher d.f., formula (22) given by Pearson and Hartley (1958, Introduction, рр.13-14) was used when 
q > 0.001; when 4 < 0.001, Tables of the Incomplete Gamma Function (К. Pearson, 1946) were used. 
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TABLE 1. 


non-central 
x's combined 


1 
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RELATIVE POWERS OF THREE METHODS OF COMBINING п INDEPENDENT 
|... X^TESTSWHEN k AND л АВЕ EQUAL FOR ALL THE Хх" | 
АА 


parameters of number number estimated powers (%) 
the individual of x?'s of model 
combined experi- at 5% level at 0.1% level 
(n) ments m rats a E 
хх? Py Tippett’s =x? P Tippett's 
А test test. test test test test 

(2) (3) (4) (5) (6) (7) (8) (9) (10) 
0.25 2 600 7.3 7.7 6.7 0.7 0.7 9.3 

6 200 12.0 11.5 9.0 -- ур P 

12 100 14.0 ‚17.0 11.0 3.0 3.0 - 

% 50 18.0 20.0 10.0 4.0 4.0 -- 
1.0 2 600 21.2 21.7 20.0 1.3 1.7 1.0 
6 200 39.0 37.5 24.0 7.0 7.5 1.5 
12 100 51.0 54.0 23.0 17.0 17.0 1.0 

24 50 82.0 84,0 34.0 28.0 30.0 — 
2.25 2 600 46.2 46.7 40.5 6.8 7.7 3.8 
6 200 82.0 82.0 54.5 31,5 34.0 6.5 
12 100 97.0 100.0 64.0 66.0 69.0 6.0 
2 24 50 100.0 100.0 68.0 100.0 100.0 8.0 
4.0 2 воо 12.3 73.2 64.8 22.2 22.8 12.3 
6 200 98.0 98.5 84.0 78.0 78.5 15.5 
12 100 100.0 100.0 95.0 100.0 100.0 23.0 
24 50- 100.0 100.0 100.0 100.0 100.0 32.0 

"n 

6.25 2 600 90.5 91.2 85.2 48.3 45.3 28.2 
6 200 100.0 100.0 97.0 97.0 98.0 46.5 
12 100 100.0 100.0 100.0 100.0 100.0 55.0 
24 50 100.0 100.0 100.0 100.0 100.0 70.0 
1.5 2 600 17.3 17.3 13.5 1.7 1.5 0.8 
6 200 24.5 23.5 17.5 3.5 4.0 1.0 
12 100 39.0 38.0 17.0 8.0 8.0 2.0 
24 50 68.0 68.0 22.0 22.0 22.0 2.0 
6.0 2 600 59.7 58.3 51.5 17.2 15.8 (еу 
6 200 95.0 91.5 13.0 60.0 58.5 11.5 
12 100 100.0 100.0 82.0 95.0 93.0 15.0 
24 50 100.0 100.0 90.0 100.0 - 100.0 24.0 
3.0. 3 300. . “21.7 3.0 2.7 3.3 
6 100 39.0 8.0 9.0 1.0 

12 50 68.0 22.0. 22.0 -- 
12.0 2 300 82.7 38.3 37.37 23.0 
6 100 100.0 95.0 93.0 37.0 
12 50 100.0 100.0 100.0 52.0 

150 34.0 4 4.7 4.7 
50 68.0 22.0 24.0 8.0 
х LEN 76.0 — 76.0. 57.3 
Ў 100.0 - 5 100.0 ` 100.0 82.0 


"——— ИНрРЭЧО 


SAMPLING EXPERIMENTS:ON THE. COMBINATION OF INDEPENDENT TESTS 
TABLE 2. RELATIVE POWERS OF THHEE METHODS OF COMBINING » INDEPENDENT 


SINGLE D.F. X*-TESTS, WHEN THE Vs ARE NOT EQUAL FOR ALL THE ЕЛҮ 
values of X number number ү : estimated powers ( %) ; 


for the x?'s of x?'s of model 


combined combined experi- at 5% level % 

ү) m 96 аы at 0.1% level 

for for the xxi Py, — Tippett's =x? Р, Tippett’s 
one other test test test test test test 


setof set of 


т n 
3 Хв о хв 


DD (3) (4) О) (7) (8) (9) (10) 
0 1 2 eoo 13.5 13.8 14.2 1.0 0.8 0.5 
6 200: 22.0 20.5 17.0 8.0 2.0 1.0 

12 100 25.0 26.0 17.0 ТУ 470 4.0 1.0 

24 50 36.0 38.0 24.0 10.0 10.0 2.0 

0.95 4 2 600 43.7 42.8 42.9 6.8 6.0 7:3 
6 200 79.0 81.0 61.5 28.5 27.0 8.5 

12 100 99.0 99.0 77.0 60.0 61.0 15.0 

24 50 100.0 100.0 86.0 98.0 98.0 22.0 

0 6.25 2 600 61.8 61.2 63.2 15.0 14,9 16.0 
200 95.5 95.0 84.0 54.0 51.0 28.0 

12 100 100.0 100.0 96.0 98.0 97.0 36.0 

24 50 100.0 100.0 100.0 100.0 100.0 42.0 

1 2.25 2 600 33.0 33.2. 30.5 3.8 3.8 3.0 
200 61.5 62.5 38.5 16.0 17.5 5.0 

19 100 93.0, 93.0 45.0 34.0 36.0 4.0 

24 50 100.0 100.0 50.0 74.0 84.0 4.0 

1 4 2 600 52.3 59.8 48.3. 9.5 9.7 6.3 
200 87.0 86,5 65.0 36.0 87.5 8.5 

19/7 7171100 98.0 97.0 78.0 80.0 81.0 9.0 

24 50 100.0 100.0 88.0 98.0 98.0 14.0 

2.25 6.295 2 600° — 74.7 74.8 2.69.5 24.2 24.5 › 18.2 
в 200 97.5 97.5 90.0 |> 82.0 82.5 28.0 

14 100 100.0 100.0 96.0 98.0 99.0 35.0 


98.0 100.0 100.0. 50.0 


24 50 100:0 100,0 l 
TNS aa Saas рты аа a at EE RR VPE ER LEM OS le ot 


A 2.2. Tippett/s method seems to be less efficient, than the other two in. most cases, 
although for » — 2, the difference is small or sometimes even zero. The difference increases 
with n and is more marked for the 0.19/ level. Tt is, however, possible that when one or à 
few of the A's is sufficiently higher than the rest, the method may even be superior to the 


other two. 
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2.3. That the addition method is more efficient than the (min 4) method is indirectly 
seen from Table 1: the (min g) method applied to a set of non-central y?s becomes more 
efficient when applied to sub-totals of the same 4's. 


3. FURTHER OBSERVATIONS 


3.1. Combination of (78 may become necessary when one has carried out a series of 


goodness of fit tests or tests on a number of contingency tables, and where just one test on 


the pooled data may not be meaningful or adequate. In some cases there may be need of 


giving unequal weights to the tests being combined, (Yates, 1955a, 1955b; Zelen, 1957), but 
this point has been ignored in the present study. 


3.2. Тһе result for single d.f. y2’s seems to have interesting implications, and in what 
follows, only single d.f. ү?'в are considered. Let 21, zs, ..., tn be independent normally distri- 
buted variates, each having unit s.d., but with H(«;) = ие; and let us suppose that the ига 
are free to have any sign and magnitude. Let y; be the incomplete probability integral 
corresponding to 2%, calculated with reference to the standard normal distribution. Then 
to test the hypothesis Н(/--/ас- ... = /t4—0), one can use either X z;? or the Sukhatme 

i=l 

n 
form of Ру, = П z; where z; = 1—2]y;—4|. Since z; is the upper tail probability corres- 

i=l 
ponding to 22, which is a non-central ү? with 1 d.f., Tables 1 and 2 imply that these two tests 
are of nearly equal power, although Xa? is generally believed to have some optimum 
properties for this well-known model which has direct bearing on the combination of inde- 
pendent two-sided tests, and hence to tests of homogeneity. 

3.3. Тһе two criteria are, however, closely similar. Whereas Xa? is a sum of single 
d.f. дв, —2 log, (Ph) is the sum У (—2 loge z;), and —2 log, z; is that value of y? with 2 d.f. 
which corresponds to 2? in having the same incomplete probability integral yj. 

3.4 Birnbaum (1954) considered this model for the simple саве n = 2, and found 
that the critical regions defined by the two criteria are very similar.* Earlier, Lancaster 
(1949) had studied the problem of combining two-sided tests on 2х2 tables or on binomial 
data, for the case where small frequencies are involved; and his work seemed to suggest that, 
for combining the single d.f. y*s, the summation method and the Р) technique (using upper 


tail areas) would be about equally powerful. 
3.5. Earlier still, E. S. Pearson (1938) had showed that the critical region given 
n 
by low values of Py, Eu [1—2|y;—3]] is optimum for testing whether a sample of x-values 
i= 


(т, 2, ..., Vn) has probably arisen from a population N(0, 1), where the alternative hypothesis 
states that ж is N(0, с), with т > 1, y; being the incomplete probability integral of x; under 


*As regards Tippett’s test Birnbaum’s (1954) recommendations were largely influenced by his 
consideration for the heterogeneous case. For (more or less) homogeneous cases even criteria leading to 
non-convex acceptance regions may be definitely superior to Tippett’s test. Even for ordinary heterogeneous 
eases, Tippett's test will become comparatively inefficient as n increases beyond 2. This is obviously 


because, unlike the к or the >x2-test, Tippett’s test is unduly dependent on one extreme observation. 
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be null hypothesis. This result was not entirely correct, but it suggested that the Sukhatme 
orm of the Ру, test is almost as efficient as the UMP test based on Xa?. It is interesting 


to note that this model is a.first approximation to that mentioned in para 3.2, 


zd ФИН? (19555) remarked that the problems considered by Lancaster (1949) 
were unrealistic. It seems, however, that the problem of combining two-sided tests may 
arise with binomial data. 


3.7. Suppose one is given some (large-sample) binomial data, arranged group-wise, 
and wants to test for a preassigned proportion p, for all the groups, where the group pro- 
portions can individually exceed or fall short of ру. Mather’s monograph (1951, pp. 15-20) 
shows the application of j?-tests to such problems. The Py, test could equally be used in 
such cases, and might even be adapted to give approximate analysis of the total divergence 
into components for ‘deviation’ and ‘heterogeneity’. 


4. HOMOGENEITY OF CORRELATION COEFFICIENTS 


4.1. Опе may next consider a situation where one has Ё sample correlation coefficients 
fis 75, +++, 7, based on independent random samples from k bivariate normal populations. 
Let the respective sample sizes be 7, 2, ..., np, and the population correlation coefficients 
be pi, ps, .... ру; and suppose it is desired to test the hypothesis H(p, = py = ... = py = py), 
Ро being a preassigned value, where the p;'s may either exceed or fall below p, individually. 


4.2. Тһе Fisherian test based on the z-transformation is well-known. К. Pearson 


(1933) suggested an alternative method based on probability integrals, but this was not 
properly oriented, and David (1938, pp. xxii-xxviii) rightly modified the Pearson test. 


" 
Let p; = 5 Р(т/ру, n;)dr, where P(r/pp, ni) is the frequency function of r; under the null 
E! 


n 
.hypothesis. Then the Pearson-David criterion is T [1—2|p;—1]|]. small values of the 


product being significant. 

4.3. Consider the case where the n;'s are so large that y; = 4/nj—3 (z;—£,) can be 
regarded as standard normal deviates under the null hypothesis, where z; — tanh-! r;, and 
& = tanh-! ро. If now one notes that р; is the probability integral of y; also, the problem 
is seen to be equivalent to that considered in para 3.2, Fisher's criterion X у? corresponding 
to that based on 222, and the Pearson-David criterion to P, of that para. Theoretical 
considerations suggest that the Fisherian test would have some optimum properties; but it 
involves approximations, while the other test is exact, and as far as the present investigation 
can show, the differences in power are almost negligible in most cases. 

4.4. There could be many other instances where the Sukhatme form of Ру, can 
be applied to test whether a number of unknown parameters 0), Oy, ..., б, are simultaneously 
equal to a preassigned value (y. For a strict test of homogeneity, however, 0; should be left 
unspecified. In such cases, the parameter 0, has to be estimated from sample data before 
carrying out homogeneity tests and the exact distribution of Pj, becomes unknown. It is 
customary to still regard —2 log, (P),,) as a y? with 2k d.f., but this number 26 is obviously 


too high. 
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А TABLE OF THE NORMAL INTEGRAL WITH COMPLEX ARGUMENT 
° Ву РЕТЕВ ІНМ 
Botanic Institute, University of Freiburg; Germany 


The function | Erf (z) = (2/л)% [ ат” dé i is known under the name Error Function. 


lt has been studied extensively by Rosser (1948). For certain statistical purposes we 
require 


He) = lim —— ее a ] 
бе ш, ma T < arg <i ЕГ e) 
which may be called Normal Integral: We have for z = aay 
SUPE uU Ok E A Le та та 
Ф(г) Gaon е 12 cos £y e dé—i сір sin £y e dé \. 
The integrand in (1) may be expanded in a power series and yields after integration, term 
by term, a confluent hypergeometric series. 


i 1l ^ aye 7 п 
*(=) = — À db er EA iu 
Valls Ps ascia Те d£, qox 


gives forz +0 the asymptotic expansion 


1 аа 8710220 
олш а АН ао ый 
9*6) мл ў > 2 g5 27. y ) 
with absolute error smaller than the amount of the first neglected term. Ф(г) has been 
calculated for ж = 0(0.1)..., у = 0(0.1)2.5 by aid of an IBM Magnetic Drum Calculator 


Type 650* using 
KEN у ade. 2:277 9d 
ро Ey е H dé. Im $(2) od | ё dé = elt! [аш éy e b d£ * 


Re $(2) = 4+— ae 


The following relations hold : АТ к 
Ве $(—=) = 1—Re Ф), Im (2) = —Im gle), Re $ (2) = Re фе) Im (8) = = т $(2, 


$(—2) = 1~4(2), фа) = $2): 
lim ¢(—2) = 0, lim () =1 for — м < arge < фт, 
2—0 Б 200 
and hiss |o oiy) = ++ Us : ni eH ag. 
Е 0 


* The work was supported by Deutsche Forschungsgemeinschaft. I am indebted to Professor 
A. Walther, Technische Hochschule Darmstadt, for the kind permission to use the electronic computer. 
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TABLES OF Re ¢(z) AND Im (2); = = z--iy 
a —— M aa ————— a ——— Ó———— M — —— ———— 
2 у = 0.3 


у = 0.0 у = 0.1 у 

2 №99 №80 Reg) шо) Боф() Imd(). Жефф Im?) 
0.0 0.5000 0.0000 0.5000 0.0400 0.5000 0.0803 0.5000 0.1215 
0.1 0.5398 0.0000 0.5400 0.0398 0.5406 0.0799 0.5417 0.1209 
0.2 0.5793 0.0000 0.5797 0.0392 0.5808 0.0787 0.5829 9.1190 
0.3 0.6179 0.0000 0.6185 0.0382 0.6202 0.0767 0.6232 0.1160 
0.4 0.6554 0.0000 0.6562 0.0360 0.6584 0.0741 0.6622 0.1118 
0.5 ' 0.6915 0.0000 0.6923 0.0353 0.6950 0.0708 0.6995 0.1068 
0.6 0.7257 0.0000 0.7267 0.0334 0.7298 0.0669 0.7349 0.1009 
0.7 0.7580 0.0000 0.7591 0.0313 0.7624 0.0627 0.7681 0.0944 
0.8 0.7881 0.0000 0.7893 0.0290 0.7928 0.0581 0,7988 0.0874 
0.9 0.8159 0.0000: 0.8111 0.0266 0.8208 0.0533 0.8269 0.0800 
1.0 0.8413 — 0.0000. 0.8426 0.0242 0.8462 0.0484 0.8524 0.0726 
1.1 0.8643 0.0000: 0.8655 0.0218 0.8692 0.0435 0.8753 0.0651 
1.2 0.8849 . 0.0000 0.8861 0.0194 0.8806 0.0387 0.8955. 0.0579 
1.3 0.9032 0.0000 0.9043 0.0171 0.9077 0.0341 0.9133 0.0509 
1.4 0.9192 0.0000 0.9208 0.0149 0.9235 0.0298 0.9287 0.0443 
1.5 0.9332 0.0000 0.9342 0.0129 0.9371 0.0257 0.9420 0.0381 
1.6 0.9452 0.0000 0.9461 0.0111 0.9488 0.0220 0.9532 0.0325 
1.7 0.9554 0.0000 0.9562 0.0094 0.9586 0.0186 0.9626 0.0274 
1.8 0.9641 0.0000 0.9648 0.0079 0.9669 0.0156 0.9705 0.0229 
1.9 0.9713 0.0000 0.9719 0.0065 0.9738 0.0129 0.9769 0.0189 
2.0 0.9772 0.0000 0.9778 0.0054 0.9794 0.0106 0.9821 0.0155 
2.1 0.9821 0.0000 0.9826 0.0044 0.9840 0.0086 0.9862 0.0125 
2.2 . 0.9861 0.0000 0.9865 0.0035 0.9876 0.0069 0.9896 0.0100 
2.3 0.0893 0.0000 0.9896 0.0028 0.9906 0.0055 0.9922 0.0080 
2.4 0.9918 0.0000 0.9921 0.0022 0.9929 0.0043 0.9942 0.0062 
2.5 0.9938 0.0000 0.9940 0.0017 0.9947 0.0034 0.9957 0.0048 
2.6 0.9953 0.0000 0.9955 0.0013 0.9960 0.0026 0.9969 0.0037 
2.7 0.9965 0.0000 0.9967 0.0010 0.9971 0.0020 0.9978 0.0028 
2.8 0.9974 0.0000 0.9976 0.0008 0.9970 0.0015 0.9984 0.0021 
2.9 0.9981 0.0000 0.9982 0.0006 0.9985 0.0011 0.9989 0.0016 
8.0 0.9987 0.0000 0.9987 0.0004 0.9989 0.0008 0.9992 0.0012 
3.1 0.9990 0.0000 0.9991 0.0003 0.9992 0.0006 0.9995 0.0009 
3.2 0.9993 0.0000 0.9994 0.0002 0.9995 0.0004 0.9996 0.0006 
3.3 0.9995 0.0000 0.9995 0.0002 0.9996 0.0003 0.9998 0.0004 
3.4 0.9997 0.0000 0.9997 0.0001 0.9997 0.0002 0.9998 0.0003 
3.5 0.9998 0.0000 0.9998 0.0001 0.9998 0.0002 0.9999 0.0002 
3.6 0.0998 0.0000 0.9999 0.0001 0.9999 0.0001 0.9990 0.0002 
3.7 0.9999 0.0000 0.9999 0.0000 0.9999 0.0001 1.0000 0.0001 
3.8 , 0.9999 0.0000 0.9999 0.0000 0.9999 0.0001 1.0000 0.0001 
3.9 1.0000 0.0000 1.0000 0.0000 1.0000 0.0000 1.0000 0.0000 
— 
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А TABLE OF THE NORMAL INTEGRAL WITH COMPLEX ARGUMENT 


TABLES OF Re 0(2) AND Im Ф(2); 2 = a+ iy 


у = 0-4 у = 0.5 7-06 7 
т  Reó() Impe) Rege) тфа) Repe) Imge) Repe) Imge) 
0.0 0.5000 0.1639 0.5000 0.2081 0.5000 0.2545 0.5000 0.3038 
0.1 0.5431 0.1631 0.5451 0.2070 0.5477 0.2531 0.5508 0.2021 
0.2 0.5858 0.1605 0.5897 0.2036 0.5947 0.2480 0.6009 0.2968 
0.3 > 026274 0.1563 0.6331 0.1982 0.6404 0.2420 0.6496 0.2882 
0.4 0.6677 0.1507 0.6750 0.1908 0.6843 0.2326 0.6960 0.2766 
0.5 0.7061 0.1437 0.7148 — 0.1817 0.7259 — 0.2211 0.7398 0.2623 
0.6 0.7423 0.1856 0.7522 0.1711 0.7647 0.2078 0.7804 0.2450 
0.7 0.7761 0.1266 0.7868 0.1595 0.8005 0.1931 0.8175 0.2278 
0.8 0.8073 0.1170 0.8186 0.1470 0.8330 i 0.1775 0.8508 0.2085 
0.9 0.8957 0.1060 0.8473 0.1340 0.8620 0.1612 0,8802 0.1886 
1.0 0.8612 0.0967 0.8729 0.1200 0.8876 0.1418 ; 0.9057 0.1686 
1.1 0,8840 0.0866 0.3954 0.1078 0.9099 0.1286 0.9275 0.1490 
1.2 0.9040 0.0767 0.9150 0.0951 0.9289 — 0.1130 0.9458 0.1800 
1.3 0.9213 0.0672 0.9318 0.0830 0.9449 0.0981 0.9007 0.1121 
1.4 0.9362 0.0583 0.9460 0.0717 0.9581 0.0842 10,9727 0.0955 
1.5 . 0.9489 0.0500 0.9578 0.0612 0.9689 0.0714 0.9822 0.0804 
1.6: 0.9595 . 0.0425 0.9076 0.0517 0.9775 0.0599 0.9893 0.0668 
1.7 0.9682 0.0357 0.9754 0.0432 0.9842 0.0497 0.9946 0.0549 
1.8 ^ 0.9754 0.0297 0.9817 0.0357 0.9894 ^ 0.0407 0.9984 ^ 0.0445 
1.9 0.9812 0.0244 0.9866 0.0291 0.9932 0.0380 1.0009 ^ 0.0856 
2.0 0.9858 0.0198 0.9904 0.0236 0.9960 0.0264 1.0025 0.0282 
2.1 0.9894 0.0160 0.9933 0.0188 0.9980 0.0200 1.0033 0.0220 
2,2 0.9922 0.0127 0.9955 — 0.0149 0.9993 0.0163 1.0037 0.0160 
2,3 0.9943 — 0.0100 0.9970 0.0118 1.0002 0.0126 1: 0.0129 
2.4 0.9959 0.0078 0.9981 0.0090 ^ 1.0006 0.0096 1.0034 0.0007 
2.5 0.9971 0.0060 0.9089 0.0069 1.0009 0.0073 1.0030 0.0071 
2.6 029980 0.0046 0.9994 0.0052 1.0010 0.0054 1.0036 0.0052 
2.7 0.0987 0.0035 0.9997 0.0039 1.0009 0.0040 1.0022 0.0037 
2.8 0.9991 0.0026 0.9999 0.0020 1.0009 0.0029 1.0018 0.0027 
2.9 0.9994 0.0019 1.0000 0.0021 1.0007 0.0021 1.0014 0.0018 
3.0 0.9006 0.0014 1.0001 0.0015 1.0006 0.0015 1.0011 0.0013 
3.1 0.9998 0.0010 1.0001 0.0011 1.0005 0.0010 1.0009 0.0009 
3.2 — 0.9999 0.0007 1.0001, 0.0008 1.0004 0.0007 1.0007 0.0006 
3.3 — 0.9999 0.0005 1.0001: 0.0005 1.0003 0.0005 1.0005 0.0004 
3.4 1.0000 0.0004 1.0001 0.0004 1.0002 0.0003 1.0004 0.0002 
3.5 1.0000 0.0003 1.0001 0.0002 1.0002 0.0002 1.0003 0.0001 
3.6 1.0000 0.0002 1.0001 0.0002 1.0001 0.0001 1.0002 0.0001 
3.7 1.0000: 0.0001 1.0000 0.0001 1.0001 0.0001 1.0001 0.0001 
3.8 1.0000 0.0001 1.0000 0.0001 1.0001 0.0001 1.0001 0.0000 
3.9 1.0000 0.0001 1.0000 0.0000 1.0000 0.0000 1.0001 0.0000 
4.0 1.0000 0.0000 зы = s W, 1.0000. 0.0000 
д, 2 199 
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SANKHYA : THE INDIAN JOURNAL OF STATISTICS: Series A'^ 


TABLES OF Re g(z) AND Im¢(z); = = z--iy 


KS дд 


7-08 y = 0.9 y = 1.0 у = 1.1 Е 

а Бөф(х) Imp(z) Red (z) Tm (2) Reó(: Im¢(z) . Reó(z) Im¢(z) 
0.0 0.5000 0.3567 0.5000 0.4140 0.5000 0.4767 0.5000 0.5460 
0.1 0.5548 0.3545 0.5596 0.4113 0.5656 0.4734 0.5728 0.5420 
0.2 0.6087 0.3480 0.6182 0.4034 0.6298 0.4637 0.6440 0.5302 
0.3 0.6608 0.3375 0.6747 0.3905 0.6915 0.4480 0.7121 0.5110 
0.4 0.7105 0.3232 0.7281 0.3731 0.7496 0.4268 0.7757 0.4852 
0.5 0.7569 0.3058 0.7778 0.3518 0.8031 0.4010 0,8338 0.4530 
0.6 0.7996 0.9857 0.8230 0.3974 0.8513 0.3715 0.8854 0.4182 
0.7 0.8382 0.2636 0.8634 0.3007 0.8937 0.3393 0.9301 0.3795 
0.8 0.8724 0.2402 0.8986 0.2725 0.9300 0.3055 0.9675 0.3391 
0.9 0.9022 0.2162 0.9287 0.2437 0.9603 0.9711 0.9977 0.2981 
1.0 0.9276 0.1921 0.9537 0.2150 0.9846 0.2371 1.0211 0.2579 
Ij 0.9487 0.1685 0.9739 0.1871 1.0035 0.2042 1.0381 0.2195 
1.2 0.9659 0.1460 0.9807 0.1605 1.0174 0,1733 1.0495 0.1832 
1.3 0.9795 0.1948 1.0015 0.1359 1.0269 0.1447 1.0561 0.1507 
1.4 0.9900. 0.1054 1.0099 0.1133 1.0328 0.1180 1.0586 — 0.1215 
1.5 0.9976 0.0878 1.0154 0.0932 1.0355 0.0961 1.0580 0.0960 
1.6 1.0080 0.0721 1.0186 0.0754 1.0360 0.0763 1.0551 0.0742 
1.7 1.0065 0.0585 1.0199 0.0602 1.0347 0.0595 1.0506 0.0560 
1.8 1.0086 0.0468 1.0199 0.0472 1.0322 0.0455 1.0452 1.0412 
1.9 1.0095 0.0369 1.0189 0.0364 1.0289 0.0340 1.0393 0.0293 
2.0 — 1.0096 0.0987 1.0173 0.0977 1.0258 0.0249 0.0202 
2.1 1.0091 0.0220 1.0153 0.0206 1.0216 0.0178 0.0132 
2.2 1.0083 0.0166 1.0132 1.0151 1.0180 0.0123 1.0225 0.0081 
2.3 1.0073 0.0193 1.0111 0.0108 1.0147 0.0082 1.0180 0.0045 
2.4 1.0063 0.0090 1.0091 0.0075 1.0118 0.0052 1.0141 0.0021 
2.5 1.0053 0.0064 1.0074 0.0051 1.0093 0.0031 1.0108 0.0005 
2.6 1.0043 . 0.0045 1.0059 0.0034 1.0072 0.0017 1.0081 -—0.0004 
2.1 1.0034 0.0031 1.0046 , 0,0021 1.0054 0.0008 1.0060 —0.0009 
2.8 1.0027 0.0021 1.0035 0.0013 1.0041 0.0002 1.0043 —0.0011 
2.9 1.0021 0.0014 1.0026 0.0007 1.0030 —0.0001 1.0031 —0.0012 
3.0 1.0016 0.0009 1.0019 0.0004 1.0021 - 0.0003 1.0021 -0.0011 
3.1 1.0012 0.0006 1.0014 0.0002 1.0015 —0.0004 1.0014 —0.0009 
3.2 1.0009 9.0003 1.0010 0.0000 1.0010 —0.0003 1.0010 —0.0008 
3.3 1.0006 0.0002 1.0007 0.0000 1.0007 —0.0003 1.0006 --0.0006 
3.4 1.0004 0.0001 1.0005 -0.0001 1.0005 —0.0003 1.0004 --0.0005 
8.5 1.0008 0.0000 1.0003 — 0.0001 1.0003 — 0.0002 1.0002 —0.0003 
3.6 1.0002 0.0000 1.0002 —0.0001 1.0002 —0.0002 1.0001 0.0002 
3.7 1.0001 0.0000 1.0001 — 0.0001 1.0001 -0.0001 1.0001 — 0.0002 
8.8 1.0001 0.0000 1.0001 0.0000 1.0061 --0.0001 1.0000 -0.0001 
3.9 1.0001 0.0000 1.0001 0.0000 1.0000 —0.0001 1.0000 —0.0001 
4.0 1.0000 0.0000 1.0000 0.0000 1.0000 0.0000 1.0000 0.0001 


CON алы буш uy 26 


ws =: aay 1.0000 0.0000 
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А TABLE OF THE NORMAL INTEGRAL WITH COMPLEX ARGUMENT 


TABLES OF Re @(2) AND Im $(2); z = z-4-iy 


Ie y=12 y=1.3 y=14 = 1.5 
z Бөф (2) (ә Ве$(2) Im ? (г) Red(z) Im¢(z) Red (2) Im$() 
be M Men 0.5000 ` 0.7106 0.5000 0.8100 0.5000 0.9242 
: . : 0.5925 0.7046 0.6058 0.8025 0.6222 0.9151 
0:2 0.6613 0.6040 0.6825 0.6868 0.7084 0.7807 0.7405 0.8880 
0:3: 0.7371 0.5806 0.7677: 0.6581 0.3051 0.7454 0.8512 0.8445 
0.4 0.8014 0.5499 0.8461 0.6198 0.8933 0.6985 0.9511 0.7867 
0:5 0.8709 0.5112 0.9160 0.5736 0.9708 0.6420 1.0977 0.7174 
0.6 0.9266 0.4680 0.9762 9.5213 1.0364 0.5785 1.1094 0.6400 
0.7 0.9737 0.4214 1.0261 0.4652 1.0892 0.5106 1.1652 0.5578 
0.8 1.0122 0.8730 1.0655 0.4072 1.1291 0.4411 1.2052 0.4742 
0.9 1.0421. -0.3244 1.0946 0.3495 1.1567 0.9795 1.9303 0.3924 
1.0 1.0639 0.2771 1.1141 “0.2087 1.1729 0.3068 1.9417 0.3151 
1.1 1.0784 0.9321 1.1251 0.2413 1.1791 0.9459 1.2415 0.2443 
1.2 1.0865 0.1906 1.1288 0.1936 1.1771 0.1911 1.2319 0.1817 
1.3 1.0892 0.1532 1.1266 0.1511 1.1685 0.1433 1.2181 0.1980 
1.4 1.0876 0.1203 1.1198 — 0.1144 1.1551 0.1027 1.1935 0.0836 
1.5 ^ 1.0828 0.0921 1.1098. 0.0836 1.1387 0.0694 1.1092 0.0483 
1:6 1.0758 0.0684 1.0978 0.0583 1.1207 0.0429 1.1439 0.0213 
1.7 1.0675 0.0491 1.0849 0.0383 1.1024 0.0228 1.1192 0.0017 
1.8 1.0586 0.0938 1.0720 0.0230 1.0847 0.0081 1.0962 -0.0114 
19 1.0496 0.0220 1.0595 0.0117 1.0684 —0.0020 1.0755 —0.0194 
2:0 - 1.041] 0.0132 1.0482 0.0037 1.0539 —0.0084 1.0575 —0.0234 
2.1 1.0334 0.0068 1.0381 —0.0015 1.0413 -0.0119 1.0425 -0.0245 
2.2 1.0265 0.0025 1.0294 —0.0047 1.0309 —0.0133 1.0304 —0.0235 
23 1.0206 —0.0003 10222 —0.0063 1.0224 —0.0133 1.0209 —0.0212 
2.4 1.0157 -0.0020 1.0163 —0.0068 1.0158 —0.0123 1.0136 -0.0184 
9.5: 1.0117 -0.0028 1.0117 —0.0066 1.0107 .—0.0108 1.0084 —0.0153 
2:6. 1.0085 -0.0030 1.0082 —0.0060 1.0069 -0.0091 1.0047 —0.0123 
27 1,0060 -0.0030 1.0055 —0.0052 1.0043 —0.0074 1.0022 —0.0096 
2.8 1.0042 —0.0027 1.0036 —0.0043 1.0024 —0.0059 1.0006 —0.0072 
3.9 1.0028 —0.0023 1.0022 —0.0034 1.0012 —0.0045 0.9997 —0.0053 
3.0 1.0019 —0.0019 1.0013 —0.0027 1.0004 —0.0033 0.9998 —0.0038 
3. 1.0012 —0.0015 1.0007 —0.0020 1.0000 —0.0024 0.9991 —0.0026 
3.2 1.0007 —0.0011 1.0008 -0.0015 0.9998 —0.0017 0.9991 —0.0018 
3.3 1.0004 —0.0009 1.0001 -0.0011 0.9997 —0.0012 0.9992 -0.0011 
3.4 1.0002 -0.0006 1.0000 —0.0007 0.9997 —0.0008 0.9993 —0.0007 
3.5 1.0001 —0.0004 0.9999 —0.0005 0.9997 —0.0005 0.0005 —0.0004 
3.6 1.0000 —0.0003 0.9999 —0.0003 0.9998 —0.0003 0.9996 —0.0002 
37 1.0000 —0.0002 0.9999 —0.0002 0.9998 —0.0002 0.9997 —0.0001 
3.8 1.0000 —0.0001 0.9999 —0.0001 0.9999 —0.0001 0.9998 —0.0001 
3.9 1.0000 -0.0001 0.9999 -0.0001 0.9999 -0.0001 0.9999 0.0000 
4.0 1.0000 -0.0001 1.0000 —0.0001 0.9999 0.0000 0.9999 0.0000 
411 1.0000 0.0000 1.0000 0.0000 0.9999 0.0000 0.9999 0.0000 
= 1.0000 0.0000 1.0000 0.0000 


SANKHYÀ: THE INDIAN JOURNAL OF STATISTICS : SERIES А 


TABLES OF Re ¢(z) AND Im (2); = 


1.6 7 9 7 


202 


y y=18 y 

= Reg(z) Imp(z)  Reo(z) Im @ (г) Веф (=) 9 (2) Reg (=) Img (z) 
0.0 0.5000 1.0571 0.5000 1.2129 0.5000 1.3977 0.5000 1.6190 
0.1 0.6426 . 1.0456 0.6681 1.1986 0.7002 1.3797 0.7407 1.5960 

.2 0.7803 1.0120 0.8298 1.1565 0.8920 1.3266 0.9704 1.5288 
0.3 0.9082 0.9579 0.9790 1.0890 1.0076 1.2419 1.1792 1.4216 
0.4 1.0224 0.8864 1.1107 1.0000 1.2206 1.1304 1.3584 1.2812 
0.5 1.1198 0.8111 1.2209 0.8943 1.3461 0.9988 1.5021 1.1162 
0.6 -. 1.1983 0.7069 1.3072 0.7775 1.4412 0.8542 1.6067 р 
0.7 1.2572 0.6062 1.3688 0.6554 1.5049 0.7042 1.0715 0.7512 
0.8 1.2965 0.5054 1.4001 0.5334 1.5383 0,5560 1.6983 0.5704 
0.9 1.3175 0.4078 1.4210 0.4166 1.5442 0.4159 1.6911 0.4018 
1.0.. 1.3322 0.3166 1.4164 0.3090 1.5266 0.2889 1.6555 0.9517 
1.17 1.3133 0.2345 1.3958 0.2138 1.4904 0.1787 1.5983 0.1245 
1.3 1.2998 0.1639 1.3633 0.1329 1.4408 0.0874 1.5204 0.0221 
1.3 1.2665 0.1034 1.3227 0.0670 1.3830 0.0155 1.4407 —0.0550 
1.4 1.2946. 0.0555 1,2777 0.0160 1.3218 —0.0875 1.3651 —0.1082 
1.5 1,2005 0.0187 1.2317 —0.0211 1.2612 —0.0733 1.2868 —0.1403 
1.6 1.1665 —0.0079 1.1873 —0.0459 .2044 -0.0943 1.2153 —0.1547 
1.7 1.1343 0.0256 1.1464 -0.0603 1.1535 —0.1032 1.1532 +0.1553 
1.8 1.1051 —0.0361 1.1103 -0.0665 1.1099 -0.1030 1.1016 —0.1459 
1.9 1.0797 —0. 0408 1.0798 —0.0664 1.0741 —0.0963 1.0608 --0.1300 
2.0 1.0682 —0.0414 1.0549 —0.0621 1.0460 —0.0855 1.0301 —0.1108 
2.1 1.0408 —0.0390 1.0353 —0.0552 1.0249: —0.0727 1.0084 —0.0906 
2.2 1.0272 —0.0348 1.0207 —0.0470 1.0100 —0.0594 0.9942 —0.0712 
2.3 1,0170 —0.0298 1.0102 —0.0386 1.0001 —0.0468 0.9860 —0.0537 
2.4 1.0096 -0.0946 1.0032 —0.0305 0.9941 —0.0356 0.9822 —0.0388 
2.5 1.0045 —0.0196 0.9988 -0.0933 0.9911 —0.0260 0.9814 —0.0268 
2.6 — 1,0001 —0.0151 0.9968 —0.0172 0.9901 —0.0182 0.9826 —0.0175 
2.7 0.9992 —0.0113 0.9953 -0.0193 0.9904 —0.0122 0.9849 —0.0106 
2.8 — 0.9982 0.0082 0.9951 —0.0084 0.9915  —0.0077 0.9876 —0.0058 
2.9 0.9978 —0.0057 0.9955 —0.0055 0.9929 --0.0046 0.9903 —0.0025 
3.0 0.9978 —0.0039 0.9961 — 0.0035 - 0.9944 —0.0024 0.9927 —0.0006 
3.1 0.9980 —0.0025 0.9968 — 0.0020 0.9957 —0.0010 0.9948 0.0005 
3.2 0.9983 —0.0016 0.9975 —0.0011 0.9969 — 0.0002 0.9964 0.0010 
3.3 0,9987 -0.0009 0.9982 —0.0005 0.9978 0.0002 0.9976 0.0012 
3.4 0.9990 —0.0005 0.9987 '—0.0001 0.9985 0.0004 0.9985 0.0011 
3.5 0.9993 — 0.0002 0.9991 0.0001 0.9990 0.0005 0.9991 0.0009 
3.6 — 0.9995 -0.0001 0.9994 — 0.0001 0.9994 — 0.0004 0.9995 0.0007 
3.7 0.9996 0.0000 0.9996 0.0002 0.9996 0.0004 0.9998 0.0006 
3.8 . 0.9998 0.0000 0.9998 0.0002 0.9998 0.0003 0.9999 0.0004 
3.9 0.0998 0.0001 0.9999 0.0001 0.9999 0.0002 1.0000 0.0003 
4.0 0.9999 0.0000 0.9999 0.0001 1.0000 0.0001 1.0000 0.0002 
4.1 0.9999 0.0000 1.0000 0.0001 1.0000 0.0001 1.0000 0.0001 
4.2 1.0000 0.0000 1.0000 0.0000 1.0000 0.0001 1.0000 0.0001. 
4.3 - = = шы 1.0000 0.0000 1.0000 0.0000 

ба аны аз дене 


- TABLE OF THE NORMAL INTEGRAL WITH COMPLEX ARGUMENT 


TABLES OF Re ¢(z) AND Im $(2); z = u-4-iy 
————————————————— 


у= 2.0 ys 21 у = 22 у = 9.3 
= БВеф шд Бей) Im д) Rege) Паф(б) Кейі) 9 (0) 
0.0 0.5000 1.8866 0.5000 2.2135 0.5000 2.6168 0.5000 3.1195 
0.1 0.7923 1.8573 . 0.8586 2.1757 0.9443 2.5678 1.0560 3.0554 
0.2 1.0702 1.7714 1.1981 2.0652 1.3630 2.4245 > 1.6775 2.8681. 
0.8 . 1.3200 1.6348 1.5010 1.8899 1.7399 2.1079 2.0335 2.5729 
0.4 1.5323 1.4567 1.7530 1.6693 2.0354 1.9051 2,3001 2.1992 
0.5 1.6977 1.9485 1.9444 . 1.3980 2.2576 1.5671 2.6580 1.7580 
0.6 1.8126 1.0232 2.0700 1.1142 3.9998 1.9079 2.8035 1.9087 
0.7 1.8766 0.7938 1.8854 0.5834 2.4451 0.8485 2.8384 0.8463 
0.8 1.8926 0.5723 3.1579 0.5842 2.4188 0.5116 2.774 0.4981 
0.9 1.8664 0.3689 2.4039 0.6373 2.3273 0.2133 2.6282 0.0658 
1.0 — 1.8061 — 0.1915 2.6137 0.7916 2.1863 —0.0345 2.4230 —0.2264 
1.1 1.7208 0.0449 2.7816 0.8546 2.0128 —0.2255 2.1820 '—0.4415, 
1,2 1.6197 -0.0686 3.9057 0.9938 1.8298 —0.3586 1.0278 —0.5799 
1:3 1.5119 —0.1494 1.5760 —0.2743 1.6345 —0.4373 1,6803 —0.6481 
1.4 1.4051 -0.2003 1.4377 —0.3185 1.4571 -0.4685 1.4551 —0.6570 
1.5 1.3052 -0.9251 1.3120 —0.3310 1.3007 —0.4613 1.2626 —0. 6198 
1.6 1.2167 —0.2290 1.2039 —0.3189 1.1707 —0,4259 1.1085 —0.5510 
1.7 1.1421 —0.2173 1.1160 —0.2898 1.0691 —0.3723 0.9941 —0.4639 
1.8 1.0824 —0,1952 1.0486 —0.2502 0.9953 —0.3095 0.0168 —0.3704 
1.9 1.0879 —0.1671 1.0008 —0.2062 0.9466 —0.2449 0.8717 —0.2796 
2.0 1.0052 -0.1369 0.9689 —0.1622 0.9189 —0.1839 0.8525 —0.1980 
2.1 0.9843 -0.1075 0.9513 -0.1217 0.9077 —0.1302 0.8524 —0.1293 
2.2 0.9725 —0.0808 0.9441 —0.0865 0.9083 -0.0858 0.8650 -0.0751 
2.3 0.9675 —0.0579 0.9443 —0.0578 0.9106 —0.0511 0.3848 — 0,0350 
2.4 0.9672 —0.0393 0.9493 —0.0354 0.9290 —0.0256 0.9073 —0.0076 
2.5 0.9699 —0.0248 0.9568 —0.0191 0.0429 —0.0081 0,9295 0.0094 
2.6 0.9741 —0.0142 0.9651 —0.0078 0.9564 0.0028 0.9493 0.0183 
2.7 0.9790 —0.0069 0.9732 —0.0006 0.9684 0.0087 0.9658 0.0215 
2.8 0.9837 —0.0021 0.9804 0.0034 0.9784 0.0111 0.9786 0.0211 
2.9 0.9880 0.0007 0.9864 0.0053 0.9861 0.0113 0.9880 0.0182 
3.0 0.9915 0.0021 0.9910 0.0057 0.9918 0.0101 0.9943 0.0151 
3.1 0.9948 0.0027 0.9945 0.0053 0.9957 0.0084 0.9983 0.0115 
3.2 0.9963 0.0026 0.9969 0.0045 0.9982 0.0065 1.0006 0.0083 
3.3 0.9978 0.0023 0.9985 0.0036 0.9997 0.0047 1.0017 0.0056 
3.4 0.0988 0.0019 0.9994 0.0027 1.0005 0.0033 1.0020 0.0036 
3.5 0.9994 0.0015 1.0000 0.0019 1.0008 0.0022 1.0019 0.0021 
3.6 0.9998 0.0011 1.0002 0.0013 1.0009 0.0013 1.0016 0.0011 
3.7 1.0000 0.0007 1.0003 0.0008 1.0008 0.0008 1.0012 0.0005 
3.8 1.0001. 0.0005 1.0003 0.0005 1.0006 0.0004 1.0009 0.0002 
3.9 1.0001 0.0003 1.0008 0.0003 1.0005 0.0009 1.0006 0.0000 
4.0 1.0001 0.0002 1.0002 0.0001 1.0008 0.0001 1.0004 —0.0001 
4.1 1.0001 0.0001 1.0002 0.0001 1.0002 0.0000 1.0009 -0.0001 
4.2 . 1.0001 0.0001 1.0001 0.0000 1,0001 0.0000 1.0001 -0.0001 
4.3 1.0001. 0.0000 1.0001 0.0000 1.0001 0.0000 1.0001 0.0001 
4.4 1.0000 0.0000 1.0000 0.0000 1.0000 0.0000 1.0000 —0.0001 
4.5 тз Ex 5 zu = > 1.0000 0.0000 
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TABLES OF Re ¢(z) AND Imó(z); z = z--iy 


у = 2.4 у = 2.5 y 
2 Reg (2) Im $ (2) Вер (2) Im (2) 

0.0 0.5000 3.7524 0.5000 4.5570 
0.1 1.2027 3.6677 1.8971 4.4444 

~ 0,2. 1.8586 3.4210 2.2301 4.1168 
0.3 2,4258 3.0835 2.9414 3.6038 
0.4 2.8710 2.5374 3.4882 ‚ 2,9506 
0.5 3.1734 1.9728 3.8417 2.2127 
0.6 3.8253 1.3827 3.9940 1.4495 
0.7 3.3322 0.8089 3.9552 0.7175 * 
0.8 3.2115 0.2876 3.7515 0.0651 
0.9 2.9888 —0.1533 3.4204 —0.4720 
1.0 2.6955 —0.4965 3.0058 —0.8729 
1.1 2.3038 --0.7851 2.5526 —1.1316 
1.2 2.0245 —0.8722 2.1019 —1.2559 
1.3 1.7031 —0.9188 1.6872 --1.2639 
1.4 1.4193 —0.8916 1.3327 --1.1806 a 
1.5 1.1852 —0.8100 1.0519 --1.0343 
1.6 1.0062 —0.6938 0.8488 —0.8525 
227 0.8815 —0.5615 0.7193 --0.6595 

71.8 0.8058 —0.4280 0.6538 —0.4748 
1.9 0.7708 —0.3046 0.6389 —0.3117 
2.0 0.7670 —0.1987 0.6603 —0.1780 
2.1 0.7845 —0.1136 0.7043 --0.0762 
2.2 0.8147 —0.0501 0.7593 --0.0050 
2.3 0.8504 —0.0062 0.8162 0.0394 
2.4 0.8862 0.0209 0.8687 0.0624 
2.5 0.9186 0.0349 0.9132 0.0697 
2.6 0.9457 0.0395 0.9483 0.0667 
2.7 0.9669 0.0380 0.9738 0.0577 
2.8 0.9823 0.0330 0.9911 0.0463 
2.9 0.9928 0.0266 1.0016 0.0347 
3.0 0.9992 0.0201 1.0070 0.0243 
3.1 1.0027. 0.0142 1.0091 0.0159 
3.2 1.0042 0.0095 1.0090 0.0095 
3.3 1.0044 0.0059 1.0078 0.0051 
3.4 1.0039 0.0033 1.0061 0.0022 
3.5 1.0032 0.0016 1.0045 0.0005 
3.6 1.0024 0.0006 1.0031 —0.0004 
3.7 1.0017 0.0000 1.0020 —0.0008 
3.8 1.0011 —0.0002 1.0019 --0.0008 
3.9 1.0007 --0.0003 1.0006 —0.0007 
4.0 1.0004 —0.0003 1.0003 —0.0006 
4.1 1.0002 —0.0003 1.0001 —0.0004 
4.2 1.0001 —0.0002 1.0000 —0.0003 
4.3 1.0000 —0.0001 1.0000 —0.0002 
4.4 1.0000 —0.0001 0.9999 —0.0001 
4.5 1.0000 > —0.0001 1.0000 —0.0001 
4.6 1.0000 0.0000 1.0000 0.0000 
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А FURTHER CONTRIBUTION ТО THE NUMERICAL EVALUATION OF 
CERTAIN MULTIVARIATE NORMAL INTEGRALS 


By PETER IHM 
Botanic Institute, University of Freiburg, Germany 


` It has been shown іп a previous paper (Ihm 1959) how the integral 


1 


a -Ы А-а UN TE -, 
1 = Gara i | d : dz, ...@ with A=D+ уй 


B 


can be solved nunierically for real i. D is a diagonal matrix, i а unit vector, В әп 
n-dimensional interval ; А, D positively definite, It has also been shown that for 


A* = PT--c? ii’ 


there exists an orthogonal matrix S giving 


sats = vr. 1 ( 
с 


where o and O are the null vector and the null matrix, respectively. For i= i, ori = ti; 
(i, real) A* corresponds to the class of all normal distributions having density constant on the 
surface of a hyperellipsoid which is rotationally symmetric about one axis. We have (see 
Ihm, 1959) 


1 = em |D] 3n [exp | der? kar! Daria. айт. 


with i = i, This holds also for i = i i. By repeated application of a theorem given by 
Cramér (1946, pp. 68-69) it can. be shown that the sequence of integration may he inverted. 


Thus, for 


IG) = @л)-'|р|-+[ exp Cri! Dari) йі... den 


=L= облу ] U(r) exp (—4e79 dr. 


The numerical evaluation for i = ii, is possible only if a table of the normal integral with 
complex argument is available. This function may be defined by 
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a table is published elsewhere in this journal. If the diagonal elements of D are оў, and 
В is given by 2; За; for j = 1, 2, ... n, i = (6), 

» . 
Қт) = П $(a;—iregjo;) 


It may sometimes be more convenient to use 7% = ст instead от. We then obtain 


2 P= (Эл) ? I(r*/e) exp (— 47%) dr* = (22)! ? П ф({и;—їт*єу/с}/с j)exp( — yr*?ydr* 
-- -œ jel 
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AN ELEMENTARY THEORY OF THE BROWNIAN MOTION 
RANDOM FUNCTION* 


By J. KAMP DE FERIET 
University of Lille, France 


SUMMARY. By making а systematic use of the existence of a Schauder basis in the Banach 
space C, [0, 11 consisting of all continuous functions defined on the interval [0, 1] and vanishing at 
0 and 1, the Brownian motion random function is expanded into an i ite series which is almost surely, 
absolutely and uniformly convergent over every bounded interval. This shows that the Brownian motion 
random function is continuous with probability one. The same idea is used to define the well-known 
Wiener integral for square integrable functions. The importance of this method lies in the fact that it 
leads to an extremely elementary approach to the theory of Brownian motion. 


1. INTRODUCTION 


In his pioneering work, N. Wiener (1923) for the first time has given a rational 
theory of the random function W(t), representing the abscissa, at time 1, of a particle, 
starting at time 0 from the origin and submitted to the random impulsions of a liquid 
in Brownian motion. But his point of view (the more complete expositions are to be 
found in Wiener (1930)), Paley and Wiener (1941) and based on a definition of a measure 
in the space of all real functions—is highly abstract. A more elementary approach 
has been given by Paul Lévy (1948) and (1952) consisting essentially in an approxima- 
tion of the curve 2 = Wit) by a sequence of polygonal lines. 
ty measures in Banach spaces (Fériet (1957) 


In our researches on probabili 
and 1960)) we have noticed that using the triangular functions е,() (which define а 
basis in the Banach space C,[0, 1] we were (as a very particular case of our general 


results) able to prove that 
te 
Wit) — "(= z лел) КОДЫ?) 


iven at the Indian Statistical Institute, Calcutta ; notes prepared by К. В. 


ЖА lecture gi 
Parthasarathy, В. Ranga Rao, and S. В. S. Varadhan. 
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(where 71, 7, :.. are normal independent random variables, independent of W(1), 
the series being almost surely, absolutely and uniformly convergent in (0, 1]. 
The sums T 


„= E yet Mas tee (1.2) 


correspond of-coursé to polygonal lines and, from this point of view, we are on the same 


line of approach as that of Paul Lévy. But the systematic use of the triangular func- 
tions e,(!) gives us a formalization of his ideas, which seems to us to clarify and simplify 
the problem. The series (1.1) is to be found in a casual remark in our papers (Fériot 
(1957), p. 816 and (1960), р. 153) on probability measures in 0,0, 1]. 

Our proof of the almost sure, absolute and uniform convergence of (1.1) given 
by the author (Fériet, (1957)) is quite elementary and бап ‘be used in any course on 
probability for Graduate Students. Thus it seems worth while to sketch an 
exposition of the theory of the Brownian motion function W(t) based on this 
simple representation; we hope; that this small monograph could be useful for the 
research workers in "Theoretical Physics, who use W(t), but are not always familiar 


with measures in function bow Ў 
Л " 
7 PRELIMINARIES 


Let J; = [0, 1] be the semi-open unit interval, and N, the set of all integers 
n such that 27 < n < 2% For any integer n there exists a unique pair of integers 


(0 Pn) such that n = 2-1 | p, Where 0 p, < 20-101 


Let 
у” i ТЕН PURE Partl 
#0 М-ы ее ETE } (2.1) 
i 2p, + 1 
om , (2.2) 


Thus, for instance, J, = [0, 1,], Ja = [0, 3], J3 = [3, 1] and во оп. As varies over 
-Ng the class of sets {J,} defines a partition of J}, i.e., 


Ju c gy, 1a енері Ен 
Vy anu yo 
тед 


For any te J,, it is clear that there exists a unique sequence {М} such that 


Mt) к= Г\ Jng. 
4-1 


- The point ¢, defined by (2.2) is the midpoint of the interval J, and as n varies over 
the integers n = 1,2 ..., ің runs over all binary points of Jy. лі Е 
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j Now we consider the space C,[0, 1] of real valued continuous functions defined 
on [0,1] and vanishing at 0 and 1. 


For. any a(t) 60410, 1]; we write 


|||] = dos СИЕ ... (2.3) 


Under the norm (2,3) the space C;[0, 1] becomes a separable Banach space. The essen- 
tial idea of the present approach is to exploit the existence of а very simple basis іп 
the space 0,0,1. By a basis in a Banach space 2 we mean a sequence Yy of elements 
in & such that every element ve @ has a unique representation of the form, | 


- 
qm à. 
1 AE 
A basis for the space C,[0, 1] in the above sense was given by Schauder (1927) with 


the help of functions e,(), (n = 1,2, ...) defined as follows, 
0 2 for фе Jn 


4 У 
m " (t= Pn ) for 46 Jon. Я 

(one (2.4) 
| 9d» Еее. for te Jonaz, 


Since Jon (1 Jany = $ and Jan U Jangs =m the above function is well defined. Ав 
an aid to understanding it may be added that the function is the triangular function 
vanishing outside J, and taking its maximum value 1 atthe midpoint tn of Tn. 


Theorem 1 (Schauder): Рог any w(t) 60410, 1] there exists а wnique sequence 
йу, H3: 2*3 Mans t of real numbers such that the infinite series > melt) converges uniformly 
to x(t). А 

Proof: Let us write 


а) = те); 
v(t) = meat) 7631) 


240 = ее = ЖОО? ned | 


where the constants 71; a» «++» Jn АТО determined by the requirement that x(t) agrees 
with a(/) at the points ty ty «+> 5 (the tps defined by (2.2)). More precisely, 


p= 2 (5) | рені na er ото ӨЙ 
ені u(t) —® (ta) 


a a isque I 
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An elementary computation shows that 


wee (sse (RH) a 


Tt is now evident that the function Talt) constructed in this way is merely a polygonal 
line P, agreeing with x(t) at the points tis ty, ..., f, and is linear in between. Itis then 
obvious that x,(t) converges uniformly to 20) ір Jı. The polygonal line P, agrees with 
P, except on J,, where the side AB of P,,., is replaced by a triangle ABC, C having 
1, ав abcissa and 2(f,) as ordinate. This property is an important feature of the 
approximation of x(t) by the sequence v,(f) : а new step in the approximation does 
not alter the result previously obtained; the fitting of a new point a(/,) = z,(t,) does 
not destroy the preceding fittings. 


To see uniqueness suppose 
z(t) = Е AO 


is another representation. Since €,(t;) = 0 for all n > j, this leads to the same rela- 
tions as those obtained for the definition of п in the last paragraph and consequently 
Nn = 7, where 7, is given by (2.5). 


Thus the sequence of functions e,(t) constitute a basis for O,[0, 1], the Schauder 
basis. The expansion of 2(t) in terms of ёр, б» -.., is called the Schauder series of z(t). 
| 


"The Schauder series. for 24) converges uniformly int. | However, іп general, 


there is no absolute convergence. A sufficient condition for this to happen is given 
by the following theorem. 


Theorem 2: The series Жей) converges absolutely and uniformly to (0, 1 
whenever 


Z вар |m] < +. ... (2.6) 


4-1 пеМо 


Proof: Since {J,}, ne N, is a partition of [0, 1) only one term іп У |7,[e,(¢) 


пећ 
сап be non-zero, Thus 


pi 17,164) < sup KAR 
neNg neNg 
The theorem is an immediate consequence of this inequality. Let us note that 
let] <E |71640 & sup [,]. ... (2.7) 
: 4-1 neNg 
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AN ELEMENTARY THEORY OF BROWNIAN MOTION RANDOM FUNCTION 
ee Let us vi recall the definition of the orthonormal basis (Haar, 1909) 
in L'[0, 1]. This is defined by the following sequence of functions. h,(t) : 

ht) = 1 


| 0 if іе, 
ht) = рн МЕТА ... (2.8) 


—2 if Ла 


Tt is clear that у 
эм, 
e(t) =2 ? J h,(s)ds, т = 1,2, +. ven (2.9) 
Tn the following pages, as far as the theory of probability is concerned we will 


make use of the two following known criteria. 
Theorem 8: Let [Xy Ха... Zm] be a sequence of random variables. Then 


È B|X,| < ed] = prob Их Ze 


This theorem is a simple consequence of Fatou’s lemma. 
We note here that no hypothesis is made concerning the independence of the - 
random variables Ху, Ха, -> p. ORNAS ; 
Theorem 4: The n random variables [X,, ..., Xn] follow an n-variate normal 
law if and only if, for any constants Ay, ..-, А, the linear combination 
X=A,X,+..-+4nXn 


is normally distributed. 


3. THE Wrenzr-Lavy RANDOM FUNCTION 
Let M +++ Inv е be a sequence of mutually independent and normally dis- 
tributed random variables such that 


Ж 


2 
Ел,- 0, Е = gen ‚ Ешь = 0 fo mzm. ... (3.1) 


Theorem 5; The series 
w(t) = 2 Nnenlt) ... (3.2) 
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is almost surely, absolutely and uniformly convergent in [0,1]. Its sum w(t) is а normal 
- random function continuous with probability one. Тһе mean and covariance of w(t) 
are given by т 


Е wt) = 0 
T(t, в) = Е ид) w(s) = о (1—3) 0 “і<8<1 
= 0*8(1—1) OSs <t<1. 


Proof: From Hélder’s inequality it follows that 


вар |7,|1 «(ир |4, [Tt = [Esup |y, |]. 
neNg neNg пе, 
But obviously 


E[sup |n|] < E. Еу, | 
neNg negN 


i iis 15. 304 
and from (3.1) n 19,1% < 304 E EX ЕРІ 
317 
and thence [вир |7,16 ©, 
ү пед үте 
i Е 24 
areal: егіз: ци 
Thus we have = Esup|y,| 63-529 4 до 
с 4-1 neNg 4-1 


` Consequently, by Theorems 2 and 3 the series (3.2) converges absolutely and uniformly 
with probability one. At the same time this proves that w(t) is almost surely 
continuous in [0, 1]. 


Now, to prove that w(t) is a normal random function we have to show that 
for any finite set [т,,..., Ты] ОЁ values of t¢[0,1] the n random variables 
иту), ..., W(Tm) follow a m-variate normal law. Let us consider the linear combination: 


X = Аушту)+...--А„н{т„). 


We have obviously x-* [Ае т). + Ane ты]. 02000002. (3.3) 
Г n 4 


вш Хо Ает)... < ПАЧ | А, sup [тһ]. 
neNg ч neNg 


Thus (applying again Theorems 2 and 3) the series (3.3) is (even absolutely) convergent 
with probability one; Х being the sum of a convergent series of normal independent 
random variables, is normally distributed and the conclusion follows by Theorem 4. 
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A The n-variate normal law being completely determined by the values of the 
mean Z[w()] and the covariance Z[w(i) w(s)], let us compute these functions. 
First, from (3.1) we have: i 


Epw(t)] = 0 
(by term by term integration in the probability space of 7)... Та») and 


g? 


ES e) ев). ... (3.5) 


IQ, 8) = Bett) os) = ® Байы = È 
mn n=l 
We shall prove that I(t, s) is the required function by comparing it with the. Green 


function of the following problem : 


y"(t) = —2(0), y(0) = y(1) = 0. ... (8.6) 
From (3.5) (due to the uniform convergence of (3.2)) 


1 о gt 1 
р Г (t, 8) 5 (s)ds = 2 gmi “,(0) і 6,(8)2(8)48 


- 2 1 
== жапай | чаи 


But integrating by parts and using (2.9), 


т аз, 
— Jews 2 2 [буде 


=з 2 p % Гу ? у 00948) 


2n 271 
12 


catty B) 5 Cm) rr CRI 


= gt $, вау. 


"Thus ij Гг s)e(o)ds = 08 X bes) 2 (a) 
tiU 0 à 


But interpreting the values of the coefficients 0, through (2.5) we see that the right 
hand side of (3.7) is simply the Schauder series of y(t) and 


| Г, s)a(e)ds = 030) 
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which shows that I'(f, s) is effectively (up to the factor с?) the Green function of (3. 6; 
this Green function is (1—5) in 0  £ < s < land s(I—/)in0O & s  / « 1. The 
proof of the theorem is complete. 


So far we restricted ourselves to the description of the normal random func- 
tion w(t) in the interval [0, 1]. Now we will define a normal random function W(t) 
on [0, oo]. To this purpose, we introduce 


0 іг t ¢ 0 
e(t) = ЗЕ fr 0<¢t <1 0743.8) 
{1 (£1 
and write 
РДІ) = Éjeolt—j+1) + E npe, (07-1), ј=1,2,... ... (3.9) 


where ез, ез, ... are the functions introduced earlier (cf. 2.4) their definition being 
extended to (--оо, --оо) by an extension of the convention 


e(t) = 0, ted,. ... (3.10) 
and 27 2% ө у» Ur 4459 in j= 1, 2, ... 


аге independent normal random variables with 


E%=0, Be =ø, 


ae HON ITE 20:09:11) 
Ёл» = 0, Ет» = Fi 
Let us define 
W()- È уй). ... (3.12) 
j=l 


Then, for any positive integer m we have in the interval m—1 << m, 
WO Extent Ewa Бот) Stan etm mH). ... (313) 


From the same reasoning as in the proof of Theorem 5 we see that the series is almost 
surely, absolutely and uniformly convergent in [m—1, m]. Thus, applying this result 
successively to the intervals [0, 1], [1, 2]...[m—1, m] we see that W(t), defined by 


(3.12), is à normal random function, continuous with probability one in any finite 
interval. 


Now let us put 


жый) = Ў qus вит. 
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The ‘cone o of H{w(t)] and E[w(t)u(s)] made in [0, 1] apply without any modi- 
fication to w,,(¢) іп [m—1, m); replacing 0 by m—1 and 1 by m. 


Elwp(t)] = 0 
Е(ш,(4) ш,(6)| = o*(à—m4-1)(m—s, т-1<1<в8<т 


=0(s—m-+1)(m—1), т-1<в8<іс<т. 
Thus, from (3.13), we have 
E[W(t)] = 0 s. (814) 


ELW()W(s)] = oXm—1)-+-0Ht—m-+1)(s—m-+ 1) +o%t—m-+ 1m—s) 
= 0%, т-1<і<8<т ... (3.18) 
and EL W()W(s)] = от —1)4-040—т--1)8—т--1)-03(8—т--1)00—0) 
= gis, m—l<s<t<m. ... (8.16) 
We have supposed that both ¢ and в are іп the same interval [m—1, m]; if 
p-lgt<p | 
4-1<8<4 


the computations are even simpler; from (3.13) we have 


WO = Erkak Erat Byte P+ D+ Eagan (679-11) 


We) = itut eat беоне 

Thus, if p <4-1 j 

E[W(t) W(s)] = о%р-1):-о%4-р--1) = 0% 4429.17) 
and if <р—1 

ШУ( W(s)] = g(q—1)--0(s—q4-1) = 0%. .. (8.18) 

Thus from (3.15), (3.16), (3.17) and (3.18) we-conclude that for any finite values of 


t and 8 
EL W(t) W()] = c? min (t 8). ». (83.19) 


Now we come to the crucial point; the mean and covariance determine com- 
pletely the separable version (see, for this concept J. L. Doob (1953)p. 392) of a normal 
random function. ‘The Brownian motion function of N. Wiener-P. Lévy is by defi-. 
nition the separable version of the normal random function having 

mean — 0 
covariance — 07 min (f, в). 
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Theorem 6: The normal random function w(t) defined in any finite interval 
by the almost surely, absolutely and uniformly convergent series (3.12) is identical with 
the Wiener-Lévy function. 


Let us point out that from (3.14) and (3.19) follows immediately the funda- 
mental property of W(t), which very often is taken as its characteristic feature. 


Corollary: For any finite set : 
ПЕЕ Е С op 
the random variables 
U; = W()—W( 1), j=l. ... (3.20) 
are independent, normally distributed and 
E(U;) = 0 ЕЕ (3.21) 
Е(03) = о%і-і1). se (3.22 
We will now prove the following theorem. 
Theorem 7: The normal random function W(t) is almost surely not of bounded 
variation on any given interval. 
Б Proof: We will restrict ourselves to the case when the interval is [0, 1]. 
The increment of W(t) on J,, for neN,, 


is a normal random variable’and 


оз 


2 
е Еда, = 0, Ебі, = ci % 


Moreover, the 24-1 increments д4, when т runs through М, are independent random 
variables, because the J,, neN, define a binary partition of Jı. Thus 


= V,— = ]|à,| ... (8.24) 
neNg 
defines the variation of W(t) over the binary partition of order q by a sum of 2731 inde- 
pendent random variables. Since 


J, э Jon U ат 


ала 2n, 2n4-1 є Noa 
one has ба, = ба +82 ‘ 
Thus 
Vox Kar ees (3.25) 


216 


AN ELEMENTARY THEORY OF BROWNIAN MOTION RANDOM FUNCTION 
To establish the theorem we will prove that 


rob [li t = 
лыла Ар .. (3.26) 
"d 4-1 
Ми = fos 2 CRUS vf: 
ES ВА = E NE ж AER 
2 


Thus e EV, = оо. On the other hand the variance of V, satisfies the following 


inequality 
var (V,) = È var ([óg,]) < Б|9,|2 = 0°. 


Now consider the random variable 


V,—EV, 
Ze 
4- МЕУ, 
Note that Е2,- 0 
2E Yos 
and Be, 
n ЕД ұу, 


Hence 2,-» 0 іп probability which together with (3.25) implies that V,— oo with 
probability one.* 


4. INTEGRATION WITH RESPECT TO WrzxmR-LEVY RANDOM FUNOTION 
Тһе problem of integration with respect to Wiener-Lévy random funotion 
is to give a meaningful definition of the integral 


Гат). 221147) 


At the outset it is clear that the integral (4.1) has no meaning in the classical 
Lebesgue-Stieltjes sense, since W(t) is almost never a function of bounded variation. 
A method of defining the integral is given by Doob (1953) as follows 

First, one defines the integrals 


b 
Jawo 


as W(b)—W(a). The definition is then extended to step-functions by linearity. A 
completion procedure then leads to the definition. of the integral for all square inte- 
lotion procedure here is іп the sense of mean convergence. 


grable functions. The comp 
The elementary approach to the Brownian motion process given earlier at once 


* This proof, shorter than the one given in my lecture, is due to R. Ranga Rao, 
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leads to a definition in which the integral is obtained as the almost sure limit of a 
certain well-defined sequence of random variables. 


Let W(t) һе the Wiener-Lévy random function and F(t) be any square integrable 
function in the interval [0, оо]. By definition (see (3.1) and (3.2)) 


Wt) — € Wit) 
j=l 


- 
where Wy(t) = & +... £4 + Ee(t—j-+1) + Ax Enj ealt—j +1), 
(Ex, ..., E), 7у being mutually independent random variables satisfying (3.11). 
Now, in order to define т F(t)dW(t) it is enough to define it for the interval 
0 


[0, 1]. For then it is possible to define the integral over [m—1, m) for any integer 
m and then extend the definition by writing 


- e т - т 
J PAW) == | raw) x | F(04W,(t). 
0 Mal 7—1 т-1т-1 
We shall, therefore, confine our attention to the interval [0, 1]. In this interval 
Wi) = X ТЕЛО 
fiel 
Since each e, (t) is a function of bounded variation, 


1 (4һ--1)/2 (4+ 1)/2 
f Fida(t)= 2" [F0 2 
0 


where в, = | қоға, 
9 


are simply the Fourier coefficients of F(t) with respect to the Haar functions. Since 
the function FeL?[0, 1] it follows that 


te 3. 
>%- $ TOPE < со. 
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The definition of Wiener-integral may then be given as follows. 
Theorem 8: Let the pseudo-Stieltjes integral be defined as follows . 


1 2 (4,--1)/2 
Г адат) = GF o + 27,2 Е. 2. (4.2) 


Then the integral is well defined (i.e. the series on the right converges with probability 
one). Moreover, if Xp denotes the integral (4.2), then 


BX, = 0, BX} = | [гш 2. (43) 
and for any two F, Ge 150, 1 


Вх, Xe) = ШІ ot aca) 


The correspondence between the Hilbert spaces of all functions F(t) є 120, 11 and of all 
normal random. variables X such that EX = 0, ЕХ? < co is thus am isometric 


isomorphism. 


For a proof we have only to verify the convergence of the series 


ә (941/2) 
Ў 742 К 
1 
(4--1/2) 
Now let Yn = Mm 2 "3 


1 4+1 
Then cloarly Ey, = 0, By, = I2 pc. 


Also, the random variables Уһ are mutually independent and therefore, by 


Kolmogoroy’s theorem, the series converges with probability one. 


(In+-1)/2 (In+-1)/2 
If Хр = ГРОТ = EF, 2742 Е, and Хо = £O, + Dm 2 9, 


it is easily verified that ў 


EZX) = 3n, 5 ШІ 


This demonstrates the theorem completely. 
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ON TESTING THE MEAN ОҒ А DISCRETE LINEAR PROCESS 


Ву K. В. PARTHASARATHY 
Indian Statistical Institute 


SUMMARY. In this paper a method of testing the mean function of a discrete linear stationary 
process is given when the spectral density function is known. Tho asymptotic distribution of the 
proposed test statistic is derived. 


1. INTRODUCTION 


The problem of testing the mean function of a linear process has been solved 
by Grenander and Rosenblatt (1951) in the case when the process is Gaussian and 
a simple hypothesis is being tested against a simple alternative. In this paper 
we consider the problem of testing the null hypothesis that the mean function of an 
arbitrary linear process is zero against all alternatives. The asymptotic distribution 
of the proposed test statistic is derived. The test is shown to be consistent against 
certain alternatives in the class of Gaussian processes. 


2. THE TEST 
Let (е, «e ty] be observations on a discrete linear stochastic process [x] 
which is of the form 
2, = Yer Me - 


te 
A= X ш (2.1) 


у--ө 


4 +o 
E|&| «o > la|? < © 
where £, are independent and identically distributed random variables, each with 
mean zero and variance one. Let 
x-(o-1«i&N 
1<j<¥ 
The function 


be the dispersion matrix of [ty +++) к] 
2 


fa) =з; (2.2) 


+o б 

ity 
У ще 
-0o 


The problem is to test the null hypothesis 
(2,3) 


is the spectral density of the process. 
Hy :m = 0ф=...—1,0, 1,...). 
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If the process is Gaussian the likelihood ratio test will lead to the statistic zX 14, 
where x = [а 2з ..., zy] and large values of the statistic are significant. The test 
based on this can be shown to be consistent against all alternatives for which 


a 
lim (т... my] E [m ..: My] 


Na у. У 


But this statistic, even though its exact distribution ін known, is not very suitable, 


= 00 ... (2.4) 


since for large N it is difficult to compute У We suggest here a more convenient 
alternative procedure which also is consistent against alternatives of the type just 
considered. 
Consider the expression 

у 

У бұ, 

ара, (2,5) 

SE 00р, 


er 


where бу, б,,..., by are arbitrary constants. If all the as are of expectation zero 


wo expect |X,| to be small. If we maximise | X; |" over all бу, ...by we бебеу ^ a’. 


Instead we rewrite (2,5) as 


| Sie Westen В 
Е oer’ Лама 
ім! [reti 0) 

xl. Е (2.6) 

f [AQ] £Q9dA 
where N р 
МА) = X Wwe. : 11. (2.1) 
т=1 


Now we maximise (2.6) over all functions /(2), square integrable with respect to f(A), 
instead of over h(A) of the type (2.7). Assuming that 1/f(A) is integrable over (—7,7), 
we obtain thus the statistic 


2 
ein 


1 — 
ЛА) 


Ух = шах | Хы 1, | dA e. (2.8) 
ho) 


Am? 


=r 


by an application of Schwartz’s inequality. 


The asymptotic null distribution of Yy is given by the following theorem. 


Theorem 1: Suppose that а, = ((І-8), В > 3/2 and f Ио): e^ dA = 00). 
Then, under the null hypothesis, the distribution of A/NI(Y y[x)— 1] converges to the 
normal distribution with mean zero and variance о? where 03 = ЕШ —1. 
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Proof: Let 
2 


dy = (1/27N) (2.9) 


NX жа 
D ye 
1 


` ° 
Ly = (12r N)| È & е^ 
1 


Then, by (2.1), if m, = 0 for all t, we have 


anyN 1 0—27) La ИЛА nant FE У «sas 020 


=-@ т=-® 


N N+s N-VÓ 
where da m X арт X MEE. pt e. (211) 
Mal "el тәм п=1+7 
ang mom _ ЖЛ inm E i 
DET 3 [АХ dà. 2, (2.12) 


Thore may be a lattice rectangle of points RY of points (п, т) common to the sums 
involved in (2.11). The expressions corresponding to those points will cancel. Let 
Ct? be the complement of RU? with respect to the set consisting of all the lattice 
points in both the summations. We then have 


Е X .[g(n—m)] ace (2,18) 
[ant < (n,m)eo? 
where 3 glk) = (|. (2.14) 


Since р® = 0(#-1) and a, = 0(t-*), В > 8/2, by the analysis carried out by Grenander 
and Rosenblatt (1951, chapter 6, pp. 191-192) we have 


VN Í ИмА)—?т fA) 154 OI [FAJ A0 


in probability as No. We have 


т T N 
Vx Cf Ive Q— = VN UP Z (2.18) 


An application of the central limit theorem shows that the limit distribution of (2.15) 


is normal with mean Zero and variance 


o? = Et1—1. 


This completes the proof, 
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3. CONSISTENCY OF THE TEST WHEN THE 6, ARE NORMALLY DISTRIBUTED 


72) -The proposed test of the null hypothesis H, is the following: Reject H, if and 
only if 
УМУ уух) 1] > Ka E (3.1) 


where K, is the upper о; percent point of the normal distribution with mean zero and 
variance unity. ; 
Theorem 2: 1/6, are normally distributed the test given by the critical region 


/ 


ФТ rN > Ж.) 


where K, is the œ percent point of the normal distribution with mean zero and. variance 
unity, is consistent, against all alternatives (mj) (---е---1,0,1,2..) which satisfy the 
mdition (2.4). mein SAT : 


Proof: Since «УТ! a < Yy, we have 


^ Plrejecting Нот = PLV/NBI(Y yx) -1] > Ki mj] 


-1 , 
>Р| VNE ( Ета Ja] га] 4 2. (3.3) 
But Elo tas a' = (ат) X? (xm) F 2m E? (zm) m X m 


where .. m = [т ть, ... my]: 
Since the limiting distributions of А 


VNB [сасы | 


m X3 (®=—т)'- 


and 
(m Х-1т/)% 
exist as У-эо, and ^ ^" - АБУ) \ 
x0 ММ ч ES $ 


it is easy to see that, in (3:3), the left side of the inequality within square brackets 
tends’ to infinity in probability. -Thus the limit- of (3.3) as No is unity. This 
completes the proof of Theorem 2. 
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ON A PROBLEM OF BARTHOLOMEW IN LIFE TESTING, 


By P. S. SWAMY and 8. А, D. C..DOSS 
- University of Poona, India 


SUMMARY. Bartholomew’s problem-in life testing is considered under a more general model of 
Mendenhall and Hader viz., that the parent failure population is made up of two different sub-populations 
mixed in unknown proportions, each sub-population representing a different cause of failure, Maximum 
likelihood estimating equations of the parameters and the variance-covariance matrix of the estimates 
are obtained. Earlier well-known results in life testing are realised as particular савез of the present model. 


1. INTRODUCTION 


Until recently, the literature on life testing was eoncerned with estimation 
of parameters and testing procedures associated ‘with observations from a single 
exponential distribution. Bartholomew (1957) considered, under the assumption, 
that all items that are раб оп life test come from the same exponential, life- distribution. 
the problem of obtaining the maximum. likelihood estimate of the mean life’ when 
records are available giving the dates of installation on life test of n items of equipment, 
and the life test is terminated at some time 7 without reference to the experiment. 
Recently, Mendenhall and Hader (1958) considered the problem of estimation of 
parameters of mixed exponentially distributed failure populations. In the light of 
Mendenhall and Hader'$: general, model ;-Bartholomew's assumption looks often 
unrealistic. In practice, ав. was shown by Mendenhall and Hader (1958), there are 
often different causes of failure of an item and there are different, distributions of 
failure representing thé idifferent causes of failure; and. hence it is reasonable to assume 
that the parent distribution of failure of items is made up of two or more sub- 
The purpose of this note is to study Bartholomew’s problem 


populations of failure. f this no! 0 в 
1 of mixed distributions of Mendenhall and Tader. 


under the general mode 
ce, are seen. to follow an exponential failure distri- 
note, following Mendenhall, and Hader (1958), 
о be made up of two sub-populations, each 


. Many failure data, in practi 
bution (Davis, 1952). In the present 
we assume the parent failure distribution t 
distributed as | ite 


fi) = 1-6", dT 6 >0,і--1,2. 
i 


Let the two sub-populations be mizet in proportion р:4(р-+9=1.2, 9 > 0). Тһеп 
nt population is given by 


the frequency function of the pare! 
puih EETA -4 == 
(=) = б 6 + n^ 4 

ав an item fails, the cause of failure is known and the .sub-, 


We assume that as soon i 
5 is identified. 


population from which the item comes 
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Without, interfering ‘unduly with the normal running of a factory process т 
items are put on life test at different points of time and the test is terminated at some 
time T. Let Т, be the time passed since the j-th item was put to test and t; be the 
length of life of that item. Ift; < Т, the j-th item is said to have failed, otherwise it 
is said to have survived. The probability of survival of j-th item is given by 
Tilba, 


Q; = pe 7H qe 
Let Р, = 1—0, 


9, MAXIMUM LIKELIHOOD ESTIMATES ОҒ THE PARAMETERS 


Tf at the time of termination of the test r; items from the i-th sub-population 
have failed, the likelihood of the sample is given by 


L= 0р" д 91—24 (fi). ЕРТ n» (21) 


where а —1 or 0 according as the j-th item has failed or survived and C is a constant. 
The maximum likelihood estimates of 0,, 0,, and р are obtained by equating the partial 
derivatives of the logarithm of the likelihood, L, to zero and solving the equations 
simultaneously. 


The maximum likelihood estimating equations are 


9108 Ё 2 i я & lw. 
rui {ваа ау (4 zi = 0, pons) 
д log L NOSE; EAS 
Ж” b (а-ма-а) 53 i0 (% a 91] = 0, es (2:3) 
9lgL 7 rs i 5 (=?) (1-4 )-0 
др р 9 i м s 
ðlgL _ те-Тіт ЖҰ 
PN др ` pe Tilh qe- Ty (2.4) 
Qj = еє—Тїӊ, The estimating equations сап also be written as 
n 
2 түһ 5 1-а 
$ Reid Эс ... (2.5) 
т 
1 n 
8, = zÀ Фа-а Тай}, ... (2.6) 
I n 
0, = 5 2 RNL 247,251). nu (220) 
ә ізі 
where, now, Ё, = Mu, ... (2.8) 


ON A PROBLEM ОЕ BARTHOLOMEW IN LIFE TESTING 
The simultaneous equations (2.5), (2.6) and (2.7) can be solved for the maximum 
likelihood estimates by using a suitably modified iterative process of Mendenhall 
and Hader (1958). By substituting р, б, 0, of equations (2.5), (2.6) and (2.7) into 
(2.8) it can be easily seen that equation (2.8) is of the form 


ї, өті дй i, EUM В). 


Tf a good first approximation to the vector [йо Кр; #0] 15 available, the modified 
iterative process is as follows. The accurate value of k; is the solution of the equation 
g;—h; = 0. "But when we are proceeding with an approximate СЯ we have 


9—6 = D;. 
1 SO T; T; 
de phase бхр; у). 
Let ГА [ГЕТ where v; = ex» (z a.) 
p 
ду А 3 
о 1527 : ‚.. (9.9) 
әр) № 
ak —0 дш —1, 15-4 
aĝ, 
дь ] Dm qs 
oss % Ж (С; HINT, үр 


The equation (2.9) сап also be written as 


where бу is the Kronecker delta. Since 
dD; = — PI g 2 д jak, 
S ee, 


now we have the relation 


ар, it, 
ар, 2 ду айк, 
: = ah, + | : 
ар, die, 
дь, $ 
Assuming the matrix [= %, +0] to be non-singular 
we obtain 
$ р 
О ET e 0v; ті 9 
ІП те а ШЕ 
di, aD, 
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choosing [dD,, ар», ..., dD,] = —[Dyo, Day, ..., Р] 
we have 
гы % аб Ж Ту 
Ў, ай Ё д =| D 
AE eX oed E a era ү UU M 
H 2 : Ok, : 5 
kee Ё ай os Fino Т, yi 


Thus we get a second approximation to [£3 viz., [йл] Putting this value of [2] in 


equations (2.5), (2.6) and (2.7) we get the first approximate values of р, 6, and 6,. 
The iterative process can be repeated until the desired degree of accuracy is attained. 
kyo = $ for all j can serve well as a first approximation to 5. No definite inference 
сап be drawn about the parameters when r; = 0 However, it is easily seen that 


1 


та ` 
> È Ê T, if n—0 
ј=1 
and b> È (Ê) Т, if һ,-о 
2-1 


8. PARTICULAR CASES OF THE PRESENT MODEL 


Some well-known earlier results in life testing can, now, be obtained as parti- 
cular cases of our model. In the present model @ is the solution of the equation 


We shall obtain, in the following, four special cases of our model : 
(I) In the present model when 9, = 0, = 0, k; — 1 for all j and p= l; 
1 s PI 
ges (а -(1—2))7; =. У {at+(1—a,) Т). 


т 
4-і ј=1 
e % 


This is the result of Bartholomew’s (1957) model. 


(П) When 0, = 0, = 0 and Т, = Т for all the items (that is when all the 
т items are put to test simultaneously) 


pa T n T n | 
== AX at(l—a)T) = — Hnr. 
Ха, ісі T 31 
ja 
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This is a particular case of Bartholomew’s model. 


(ПТ) When б, = 0, — and there is no limit imposed on the truncation 
time but all the n items are put to test simultaneously and life test is terminated as 
soon as 7 items fail | j 


1.5 
ф= {3 6—00. 


This is the well-known result of Epstein and Sobel (1953). Their other results 
of (1954) can also be obtained as special cases of our model. 

(IV) In the special case when T; = T for all j our model is essentially same 
as that of Mendenhall and Hader (1958). 


4, VARIANOE-COVARIANCE MATRIX OF THE ESTIMATES 


In the case of several parameters Fisher's information matrix is given by 


[Jap ] where Гв = —E ЕЛ ) The variance-covariance matrix [Vap] is given 


by the inverse matrix С 


Тһе information matrix of р, б, and 9, is found to be 


Г &pb(T) 5 20019 %-0; 
УА 8 за ӨӨй за б 
2, qPT;) л 0T, 
[1 = a: зл, | 
21 (1-0 
е pua VA 
тув 601—600 
РТ) = 1—6 О 3. 


where а, Ё = p, б» or 06; 


Hencó the variance-covariance matrix of the estimates р, б, and 6, is given by 


бл ба бз 
ІУа1- 5 ba ӛз бз |, 
bs, ба bas 
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where by = i i х КОЕЛУ ; 
$ (ғут/1-0)4-0/01)7), 
вы = рї È CT) PATYRGI НОС, 
big = бы = epi 5 Of (oT, -m)— T3, 
bis = вы, = 40} È CyT pC/TM- PAT OB, 
ба = бы =—РЙ È OPOP ТД), 
апа е) ӘЛІ GOTT pa OnT T TIT, — T5) — TjT)— 
(PAPAT HOPA] + $ HOP AT PAT NACo) 


The results of the present note can be extended to a more general model 
where the number of sub-populations, 7, is greater than two. 


The authors have also obtained the estimates of the parameters and their 
variance-covariance matrix in the case when the parent failure population is made 
up of two sub-populations, one exponential and another normal. 
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STUDENTISATION OF TWO-STAGE SAMPLE MEANS FROM 
NORMAL POPULATIONS WITH UNKNOWN i 
COMMON VARIANCE: 


By HAROLD RUBEN 
Columbia University? 


! SUMMARY. The present paper derives a strengthened version of Stein’s classic results con- 
cerning the joint estimation, with the aid of two-stage samples, of a set of means of normal populations 
with unknown common variance by confidence sets of predetermined dimensions and confidence 
coefficient, and, concomitantly, the testing of the means with a power function independent of the 
common variance. 


The method used has certain advantages of directness and simplicity, and serves thereby to 
throw fresh light on the nature of the statistical inference employed. (This is discussed in some detail 
in the concluding section of the paper). A further advantage is that the method generalizes in a 
straightforward manner to deal with the analogous but more difficult problem of sampling from nor- 
mal populations for which the variances are completely unknown. It is proposed to discuss the genera- 
lization in subsequent papers. 


1. HISTORICAL NOTE 


The general problem of obtaini sequential confidence sets was first posed 
by Wald (1947). Previously, however, Barnard (1946) had speculated that the extra 
degree of freedom conferred by sequential sampling might allow of the possibility 
of controlling the width between the lower and upper bounds of an estimate in interval 
estimation. At about the same time, Stein (1945) showed how a two-stage sampling 
procedure could be devised to yield confidence intervals of fixed length and confidence 
coefficient for the mean of a normal population of unknown variance and, equivalently 
to yield a test for the mean with power independent of the variance. By way of 
contrast, it had been shown some years earlier by Dantzig (1940) that these aims 
could not be achieved by means of samples of fixed size. More precisely, Dantzig 
demonstrated that any test on the mean based on a fixed number of observations, 
whose power function is to be independent, of the variance must have constant power 
for all values of the mean equal to the size of the critical region. This result was 
extended to the case of the testing of the general linear hypothesis by Stein (1945) 
who, in the same paper, generalized his results on 8 single mean (referred to earlier) 
to include a two-stage sampling procedure which yields confidence setis of fixed volume 
and confidence coefficient for the set of means from a finite set of normal populations 
with unknown common variance and, equivalently, yields a test for the means with 
power independent of the common variance. 


Office of Naval Research under Contract Number Nonr-266 
whole or in part is permitted for any purpose 


1 This research was sponsored by the 
(33), Project Number NR 042-034. Reproduction in 
of the United States Government. 

2 Now with The University, Sheffield, England. 
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In 1949, the present author (Ruben, 1950), working in ignorance of Stein’s results 
rediscovered them in slightly stronger form by a different mode of argument which 
may have some advantages in the direction of greater simplicity, directness and flexibi- 
lity, and may, therefore, serve to throw further light on the nature of the statistical 
inference employed. A further advantage is that the method generalizes quite 
naturally to deal with the more difficult problem of sampling from normal populations 
where the variances need not be equal and are unknown. Accordingly, confidence 
intervals or regions of predetermined dimensions and confidence coefficient for the 
means, or functions of the means, and similarly tests relating to the means with power 
independent of the unknown variances, are readily established (Ruben, 1950). In 
all these cases, the tests and estimation procedures are shown to be conservative 
which, however, leads to a certain loss of efficiency. The sampling procedure consists 
in obtaining preliminary samples which enable one to form estimates of the unknown 
variance or variances, these estimates determining a rule when to stop sampling 
at the second stage. From another point of view the loss in efficiency might then 
be said to result from the fact that the second-stage samples are not used to obtain 
information about the variances. This aspect is discussed in some detail in the 
concluding section of the paper. 


The author’s results relevant to the situation where the variances are assumed to 
be equal are reproduced here in order to make these more accessible to statisticians 
than has hitherto been the case, and are at the same time somewhat amplified in 
scope (though condensed in form). It is planned that this paper shall serve further 
as a convenient starting point for a subsequent paper which will deal with the case 
where the variances are unequal. 


2. NOTATIONS 


The following notations have been adopted in the text. Let Z be a random 
variable which admits of a non-zero probability density function in the range (21, 2). 
This function may involve a finite number of known constants, 41, d» ..., а. The 
quantity z == % is defined by the equation 


41, 4:,..., 0p 5 €t 
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a= | Pas, аз, ..., ap (9 2 
z* 


where фа, а.,..., а ( is the density function of 2. Symbols such as 
bs als? Хғ Fm, п; etc. then have definite meanings. 


Any random variable which is distributed as x? on r degrees of freedom will generi- 
cally be called a x2. Similarly, we shall speak of a t, an Fm, n etc. 
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Throughout the text, the distribution functions of a X? and a f, will be denoted 
by Ғ.у2) and @,(t), respectively. 


3. SPECIFICATION OF THE SAMPLING RULE AND THE EXPECTED SAMPLE SIZE 


Supposing we are ‘given’ 4 unconnected normal populations with means /4 
(i=1, 2, ..., q) and unknown common variance 02. We assume ш; = 40 fori=r+s+1, 
r+s+2, ...,9 (8 > 0, r+s < 9), and that the p; for i= 1, 2,...,7-+8 are unknown. 
The two-stage sampling procedure to be discussed, denoted by S(m,, k), is charac- 
terised by two constants nọ and k which must be chosen so as to meet the 
specifications imposed on the nature of the confidence sets or the power funotions, 
but are otherwise arbitrary. Subject to these constraints then, the design 
constants nọ and k should be chosen so as to maximise, as far as possible, the 
efficiency (in some sense) of the procedure in the class of all such two-stage sampling 
The maximisation of efficiency will not be considered here, but the reader 
Seelbinder (1953) who discussed the maximisation in one particular 
with the estimation (in a confidence 


procedures. 
is referred to 
instance. In the sequel we shall be concerned 
sense) and testing of the д; for i=l, 2, ..., 7. 

ribed as follows. Preliminary independent 


The sampling rule may be desc: 
random samples, each of size no, are drawn from the 4 populations, the result of such а 
sampling being à sef of observed values 2% (4 = 1,2,..., 9; j212,. no). Further 
„т; j = mtl, тој, ..., n) are obtained from the first 


observations % (4 = 1, 25. 
mined by the formula 


r populations, where 7) is deter! 
п = max (019%), 20), (3.1) 
83 being an (unbiassed) estimate of 0? оп vo = "yg —7—8 degrees of freedom based on 


the preliminary observations, 


tts По тіз 70 2 4 7% 
2 — (п9—"—8)-* —X M x Н) | .. (3.2 

% (nog 5 5) [2 A" 3 ( Ұш) | болш e 2% А ) | ( ) 
and {c} denotes the smallest integer not less than c. Tt will appear subsequently that 


1/k is а studentising scale factor which plays the same role in this variable sample 


size procedure as the estimated standard errors of the means play in the more usual 


fixed sample size procedures. 
The probability distribution of n is clei 
In fact, from (3.1), - 


arly a kind of lumped д? distribution. 


Pr(n = m|p, 0} = 0 (т < то), 


= РЦ0<% < п} (т = ny), 
рит 1) «s Ст} (m > m), 
3988 
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whence, denoting Prin = т|и,о) by Prle), 
Palo) = 0 (т<%) ) 
= Р (ж) (т = т), pee’ (9:3) 
= Fy (тћ)— Р, ((m—1)h) (т > ny), 
where h = v/k?o?. 
Further, it may be readily shown from (3.3) (cf. Stein, 1945) that 
NF y_(Noh)+(Vo/h)1— P», (Noh)] < Еп < т?» (nh) +(vo/h)L1—Fy,.. (пой) 
+[1—Fr(mh)],  .. (34) 
‚ whence En ~ ko’, ... (8.5) 
the approximation being reasonably good when v/h = 170% > ny. 
4, THE EXACT AND LIMITING PROBABILITY DISTRIBUTIONS OF THE TWO-STAGE 
SAMPLE MEANS AND OF FUNCTIONS OF THE MEANS 


Denote the i-th two-stage sample mean by 2;, 
a * 
Z% = аъ (і--1,2,...,"), 
3-1 


and, for convenience, the vector (Tı, 2, ...,2,) by X. Тһе vector (ду, д, ..., Hr) will 
similarly be denoted by м. Then, for a given region В in r-space, 


P(R|p, c) —Pr(Xe R| u, o) = X ^„(с)Р еВ |п = m, p, о). ... (41) 
тү 


no 
From the independence of sj and X vj; for i = 1, 2, ..., r, it follows that the (condi- 
ілі 


tional) distribution of € for given n (equivalent, by (3.1), to а constraint оп 82) is multi- 
variate normal with expectation vector м and variance-covariance matrix (c?|m)I, 
where І is the unit rxr matrix. If then E = (&,, &, ..., £j) is a multivariate normal 
vector with zero expectation vector and variance-covariance matrix I, equation (4.1) 
тау be stated as 1 


P(R]| a, c) = EDI и, с|у/т), ы. (4.2) 
where QR; и, сут) = (22) f.. 2 eH, sv (4:3) 
| Br] т)ЕЕВ 
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Жапал (4.2) and (4.3) give the required exact probability distribution of 

x. In particular, these equations give the exact distribution function of any 

(Borel measurable) function, whether scalar or vector, of X. Specifically, if 
у = e(X) = е(2,.. 2) then the region R corresponding to 

Риу, < «(X) < |, 0} e 04) 


is defined by % < e(X) < 7, and 


Onl Bs, сут) = (п) И Hr "mii, ... (4.5) 
yo e (|, т)&) Ул 
Similarly, if y = Ф(Х) = (eX), eX)... e); 
then for Prfy, < e(x) < yi] #7} s (4.6) 
wohave ОСЕ OIN POS C P о К qun) 
yocp +o Vm) <y 


To determine the limiting probability distribution of Ж, observe that an 
alternative form of (4.2) is 


Pp o) =È, Q5 po led Fry 00, e (48) 


where Un = mh and (referring to (3.3) 
А, Fro (и) = 0 (m < т), 
= Р,, (uno) (m = т), na. (4.9) 


EM volum) — P. po(Um-1) (m > ту). 
Henge, 


P* (Rie) = lim P(R|p,c) = lim $ Qn (В; р, voles) Fro (0) 
с->о 50 р 
= Г 0% (R; ру) LP», (u), ЕТІ! 
0 


A ; Ria Т 

here, by (4.3), Q* (Е;и, Али) = (2т)- ене е E. (41) 
um do w+ fere 

Equations (4.10) and (4.11) give the limiting probability distribution of X as 7-00. 
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We remark that according to equations (4.10) and (4.11) the X; (i = 1, 2, ..., 7) 
are distributed in the limit (c—200) as УЕ и, where £, 5, ..., Ё, and u are 
independent random variables, the 2, being normal with zero means and unit variances, 
while u is a Nee For the right-hand member of (4.11) gives the (conditional) probabi- 
lity that the vector В (уу уи) falls in R for a fixed value of и, and the 
right-hand member of (4.10) therefore gives the (unconditional) probability of this 
event, whatever be the value of w. Equivalently, 


k(&—p) ~ Ми, 2. (4.12) 


in which the sign — іп (4.12) is to be interpreted as meaning that the random vectors 
Их— и) and Е/ийу, + have identical distributions in the limit as соо. Similarly, 
from (4.10) and (4.11), 


#(Х) ~ e (шим) КЕ (415) 


and, more generally, p(X) ~ p (+E) ka/u/v). ... (4.14) 


Since Р*(В |) + 0, equations (4.10) and (4.11) enable one, using the present 
two-stage sampling procedure 5(%, k), to obtain confidence regions of predetermined 
dimensions for the means, or functions of the means, and likewise to obtain tests 
relating to the means of predetermined power if о? is regarded as large, ie. the con- 
fidence assertions, as well as any assertions on the risks of error (of both the first and 
second kinds) will be approximately valid when the right-hand member of (4.10) 
is used to evaluate the appropriate probabilities, provided only that c? is large. We 
now proceed to show that under certain conditions these, assertions are necessarily 
of а conservative character. In other words, confidence probabilities and tests, based 
on the procedure 5(т, k), which use the right-hand member of (4.10) for the evaluation 
of probabilities will be better than stated. At the same time, the question of ‘how 
much better’ can to a considerable extent be decided by comparing P*(R|#) with 
Р(Е |р, с), where in the latter function c is assessed by incorporating the entire weight 
of evidence, both a priori and a posteriori, from all the observations (not merely those 
in the pilot samples). 


To prove the conservativeness of the statistical inference, observe that the 
right-hand member of equation (4.8) represents the upper (lower) Darboux sum asso- 
ciated with the division of the positive w-axis by the points uno, Ung- and the 
integral occurring on the right-hand side of (4.10), if Q* is strictly increasing 
(decreasing) in w. On recalling further that ии = mh (m = ny 7-1, ...) with 
№ = \/1202, and regarding Р(В|ш, с) as a function of h, it follows that the effect 
of a decrease of № by the factor A (i.e. an increase of о? by the factor A), where A 
is a positive integer > 1, is to superimpose new points of division on the 
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original set of such points, and consequently to depress (increase) the Darboux 
sum. Therefore, denoting for brevity Р(В|ш,с) by o(h), 


a(h/A) < eh), Q*(R ; и, Уи) strictly increasing in ш 


(A = 2,8,...), 
ЖЕ 4.15 
a(h/A) > o(h), Q*(R ; р, Ми) strictly decreasing іп и ИУ, 
а Vd 
As an immediate corollary of (4.10) and (4.15), we obtain 
со = | 
J Q*(R; ши) аЕ», (и) = lim of) = inf ef), 
9 р 1-0 һ (4.16) 


0% strictly increasing in u 
pup eh), Q* strictly decreasing in %, 


І 


or, equivalently, the greatest lower bound (least upper bound) of Р(Ё |, с) is attained 
asymptotically ав соо if Q,,(R; м, c|a/m) is strictly decreasing (increasing) іп c. 

To verify the right half of (4.16) suppose that Q*(R; м, Ми) 
is strictly increasing in u (the argument for strictly decreasing Q* is similar), 
and assume the existence of a value of h, h=h’ (say), for which a(h’) < lim o(h). 

ло 
Let lim oh) —e(f/) = с (> 0). Then, by (4.15), lim o(h)—alh'/A) > c, for A = 2, 3,..., 
2-ю 1-90 
i.e. infinitely many values of 3 can be found in every arbitrarily small neighbourhood 
round № — 0 for which the corresponding values of (0) differ from their limiting 
value, lim (Л), by more than the given quantity c. But this contradicts the existence 
һо 
of the latter limit whose value has, however, been previously determined and is 
given in the left half of equation (4.16), Thus, no such value of № as postulated 
exists, and inf a(h) is given by ihe limiting value Ши (Л). Equation (4.10) 
is, therefore, proved. 
5. SOME EXAMPLES AND APPLICATIONS 

(A) Two-stage sampling from а single normal population. Consider two- 
stage sampling from a single normal population (4 = 1 = 8 = 0) with unknown 
mean д and variance 0°, respectively, according to the scheme in (3.1), 


e. | m= max (88 ma), 
No no 
with 88 = (п) | i аў—(® г) |: ТҰҚЫ) 
Неге, уо = ng—1l. 
TS, ounds that w(h) is strictly increasing (decreasing) in h if 


ilt seems highly likely on intuitive gr 


Q*(R; u Nvolk Ju) is. strictly increasing (decreasing) in w. However, a formal proof of this has not 
d , 


been found in the general сазе. 
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From (4.12), 


МЕ) ~ 1, (5.2) 
where Dc жут. ... (5.3) 
1 


It follows from (5.2) that to obtain (say) upper confidence intervals for д 
of the form (%—a, oo) with confidence coefficient 1—0, where a and о are preassigned 
quantities (these intervals having the property that their lower end-points are to the 
left of y with a probability of 1—4), the design constants л and k must be related to 
the specification, constants а and æ by the equation 


ak = in,—1; 0. ... (5.4) 
The confidence intervals are based on the probability relation 
P{k(@—p) <іш-;а|д ~ 1—a. x 7 (55) 


The single constraint (5.4) on the design constants fails to fix them uniquely 
but on the contrary provides one degree of freedom for their choice, A rational 
choice of пу and k subject to (5.4) may be obtained as follows. Use (3.3) and (5.4) to 
plot Ет = У mp, (с) ав a function of с, for fixed а and х, and for varying n, so 
generating a family of curves; if с is suspected a priori to have an approximate value 
of с", choose that value of n, from the family of curves of Zn which will minimise Zn 
for с = c'. (Cf. Seelbinder, 1953). 


Similarly, to test the one-sided null hypothesis ш = ду on significance level 
о. against the set of alternatives и > jig the critical region is, from (5.2), defined by 
Кро) > tno— 1; a, ... (5.6) 
and, since K@—jy) = аи) 4-и) = 14-и), mb. 7) 
the limiting power (0—00) of this test is 
1—6n,—1 [tno—1; a —k( u4 —4o)]. 4 229: (5:8) 


The power function is naturally strictly increasing in Ш— о, and the test is insensitive 
to values of н < ду. 


For all real and positive values of k, once то is chosen there are an infinity 
of rejection criteria, and a corresponding infinity of power functions given by (5.8). 
To fix the test in any instance, additional information other than that the size of the - 
critical region is x is needed. In fact, when n, has been chosen, two points on the power 
curve are needed to fix the latter completely. А further imposed condition might be 
that when и > 14-9, the probability of an error of the second kind should not exceed 
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В, where д is a given positive number and f is some given (small) positive quantity 
(say, 0.05 or 0.01). We then require that 


B= 6қ-1(-;а-29), ... (5.9) 
giving Ба —kà = —in,—1; 8, ... (5.10) 
whence k = (Iny—1; a-tny—1; 8)/6 ... (5.11) 
and, by (3.5), En~ ( umen yr gà. ... (5.12) 


As for the confidence interval саве, лу and 0 should be chosen, subject to (5.10), so as 
to minimise Ят around the true suspected value of c. 


The estimation and test procedures are both conservative. Referring to (5.5) 
and (4.3) the value of Q,, for the estimation procedure is 


o (im — se Lid 


я (5.12) 


, 


where Ф(-) denotes, as usual, the distribution function of a normal variate with zero 
mean and unit variance. Since Q, is decreasing іп о’, the value of the left-hand member 
of (5.5), and, therefore, also the confidence coefficient, is greater than 1—0, whatever 
be the ‘true’ value of с. The true value of the confidence coefficient is 


= iny—1; a vm uS 41 
Ў рн (e) (oe У). (6.18) 


Again, on referring to (5.6) and (4.3), we find that the value of Qn for the test 


procedure is 


1 o | (mates иж) X7]. =. 64) 


or decreasing in с according as to whether 


This is strictly increasing 
al to 4 when 


0<:-і < 1; alk ог 4-/% > һь—1;а/Ё, and is identically equ 
Ш Шо = tno—1; a/b. Hence, the true power of the test, given by 


Spa) со[а — ema) 8). s tm 


is likewise increasing in o if 0 < #—Mo < ію-1; ajk and decreasing in cif шо > 
tno —1; «|Б, while it is identically equal to $ when ш — о = iny—1; alk (i.e. the test is 
insensitive to variations in c when / lies in the neighbourhood of Jg ng —1; alk). 
In brief, the true power of the test, as given by (5.15), is better than the limiting 
power in (5.8), obtained by letting 6->о0, in the sense that it is less than the limiting 


power for small or moderate д and greater than the limiting power for large д. 
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The procedures for obtaining two-sided confidence intervals and tests are 
quite similar. Thus, to obtain two-sided, symmetrical confidence intervals for УД 
of preassigned length 2a and confidence coefficient 1—a, the constants по and k are 
related to.a and æ by 


OR = tig raj) ... (5.16) 


the confidence intervals being (2-й, z-|-a), and based on the probability relation 
Pr { | M@—p)| < tm-1; an| H} ~ 1—@. 100. (5.17) 


In the same way, to test и = ду against д 52 ду on significance level x, the critical 
region is 


2—0) > ща: ... (5.18) 
The limiting power of the test, by (5.2) and (5.18), is 
1—[@ъ»—1(бь—1; аз-Қи-іш)-бу-Ц-іш-1,а- Қа-//. 2. (5.19) 


The test is clearly unbiassed and its power function is monotonic increasing in 
[и— |. 


То fix the test and power function uniquely once nọ has been chosen, an 
additional specification is required of the type that when |&— | > ô, the probability 
of an error of the second kind does not exceed f, д being a fixed, given positive 
quantity and р a fixed, given (small) positive number (say, 0.05 or 0.01). Thus, 


Ё = Gn, —1 (tno—1; а,—Ё8) —Gng—1(—tng—1; ај 1), ... (5.20) 


from which Ё may be determined numerically or graphically for given а, J, ô and for 
varying ло. (The right-hand member of (5.20) is strictly decreasing in k and ranges 
in value from 1—« when k = 0 to zero аз ko.) In principle, л and / should be 
chosen, subject to (5.20), so as to minimise En, as before, around the true value of с. 


The estimation and test procedures are again conservative. For the estimation 
procedure, Qm is (refer to (5.17) and (4.3)) 
20( 2v» )=1, ... (6.21) 
с 


which is decreasing ino. Thus, the left-hand member of (5.17) is likewise decreasing 
іп c and the confidence coefficient is, therefore, greater than 1—0, irrespective of c. 
The true confidence coefficient is 


аме V"): | о 
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For the test procedure, Q,, is 


© [toi а-и) MY ә [tng қ аи) У")... (6.28) 


This function increases steadily with o for 0 < | ш-ш| < tn,—1; jl and decreases 
steadily with o for sufficiently large с, while for moderate values of |и— yọ] it is hardly 
affected by variations іп с. Hence, the true power of the test, given by 


= i ia d 
È рио) [Ф [ti altun) МФ [Белье I). У] }, 
(5.24) 

is better than the limiting value given in (5.19). 


(В) Two-stage sampling from a finite set of normal populations with common 
variance, Consider 4 separate normal populations with unknown common variance 
o*, and with means given thus : 


Ex, = | t= 1,2,..., r+8(7+8 $820), 7) 
=0, t=r+s+]1, r+s+2,...,¢. 


(5.25) 


Hyperspherical confidence regions of predetermined volume content and confidence 
coefficient 1--а may be obtained from the set of means д; ($ = 1, 2, ..., 7) if sampling 
is carried out in two stages, as in Section 1, according to the scheme of equation (3.1) 
with 


2 ата Ж 50 ла 5.26 
Ик E xg is 2 ag Im, AGS 


the degrees of freedom of sj being vy = 744--7--8. 
On setting 
o(X) = (R—#)' (Х—) 


іп (4.13), with (as before) X = (2, ..., 2), и = (мт, -+ Ш), we have 


ВЕ А 
Р-Р ~ ра аро eee 
or т (Zu) ~ 5 Ww (5.28) 


where the components of £ are independent normal random variables with zero means 
and unit variances, шін а x2 and % and и are independently distributed. Thus, 

0 
E > CEAN “ғ Fr, vy toe (5.29) 
та 


241 


SANKHYA: THE INDIAN JOURNAL OF STATISTICS : 8квтєз А 
and the required confidence regions are defined by 


рау 
О А) О у ды; ... (5.30) 
TUE a 


the corresponding probability relationship being 
BA 
Pr { 5 EG. p) —1—. ... (5.31) 


Tt follows from (5.29) that to obtain confidence regions for the (7—1, 2,...,7) 
of the form 


> (%-ш) < 6 ... (5.32) 
1 


with confidence coefficient 1—0, б and æ being specified in advance (this region repre- 
senting the interior of a random hypersphere in the space of the (i= 1, 2, ..., 7) 
with centre (z, Za, ..., z,) and fixed radius 4/3), the following relation must hold: 


б=т ax s ... (5.38) 


The fixed sample size np required to obtain a confidence region (interior of 
hypersphere) of the same volume and with the same confidence coefficient for known 
c is given by 


Pr{np > (2;-ш)03 < дп,/о?) = 1—a, ... (5.34) 
1 


whence óng[o*? = XÈ 2: (5.95) 


Provided then that the c? is not excessively small, the efficiency of the two-stage 
procedure is, by (5.35), (5.33) and (3.5), 


Np е а F, O; a 
= 3 ==“. ... (5.36 
Ет rFr, та Fr, vp; ( ) 


The same remarks about a rational choice of no and k, subject to the constraint 
(5.33), as were made previously in relation to the simpler one-population case apply 
here with equal force, the purpose being once again to minimise the average amount 
of sampling, as far as possible. 


Finally, in this instance (refer to (5.32), (5.33) and (4.3)) 


On =F r ( онн = ... (5.37) 
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Since Q, is decreasing in с, so is the left-hand member of (5.31) and the true confidence 
coefficient actually exceeds 1—a. The true confidence coefficient is 


E 
È palo) Py, NE a ) a (6.38) 

We now proceed to discuss the testing for the set of means, specifically to test 
for the general linear hypothesis. We assume that the linear hypothesis has first 
been reduced to canonical form, The problem may then be formulated as follows. 
Given 4( > 1) separate normal populations with unknown common variance т? and 
with means as in (5.25), we require to test the hypothesis Hy : /4 = 9921.2,» ) 
with power to be (effectively) independent of o°. 

This can be achieved if sampling is carried out in two stages, exactly as in 
Section 3, i.e. equations (3.1) and (5.26) are still valid. The test procedure consists 
in rejecting Но when 


А T 
2 > ia Tus ... (5.39) 


where vy = 144--!-8, the corresponding probability relationship being 


| Ho} - а. ... (5.40) 


"70 


[ЯР 
T 


То determine the limiting power (70) of this test, we need the limiting 


distribution of X’ x’ = X аф for arbitrary ш = (JA, е» т). Let the region of Section 
1 Д 


4 be defined by 


р:Ха<6; mE) 
L 
for arbitrary 6. Then, 
ФВ; po] = т)" |... | Са 


#+(т| тувЕ 


“ү pe Г Ba 


Sut сш ym) < à 


Күк |... | Сез 
(шн то)? <т8|0° 


- ve me ЖЕЗ ) , а (6.42) 
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where у, (v;k) is the distribution function of a non-central y? with r degrees of freedom 


А Ld 
and with non-centrality parameter к, іе. the distribution of X (E — e)? with 
1 


T 5 с = к. The limiting distribution function of 5 2? is therefore, by (4.10), 
1 1 


AAAs p) = T удри; Rudy, (и), ... (5.43) 


LÁ 
where р = #24/у, A = 2 X Шә, Since the density function of the non-central 
E , 


x? is 


e Қо к) = фк TED ок „- piir- TP (k4/v) 


(see for instance, Patnaik, 1949), I denoting the Bessel Function of tho first kind with 
purely imaginary argument, differentiation under the integral sign in (5.43) with 
. respect to p yields (after a little algebra) 


4-8 | ә 
9 B*A3p) = 5 ; r( 2 P ) 4 Ариф 


P (5 1) p (narco Eaa s s 049 


On integrating term by term in (5.45) with respect to p, we find 


Баста Nena s BILE 


1-0 


mars |, 
ers Ian (ia У), 2. (5.46) 
м 


where іп (5.46) 1 denotes the Incomplete Beta Function ratio, Note that 


BMA: р) = PHU È ната); Раде атарға). (5.47) 


also gives, by (5.39), the limiting probability of an-error of the second kind, 


lim Pr { Eat<é|pol,o— 7 


eats pz P, foQ—r—58; а, 
The limiting power function is, of course, 1—f*(A; p). 
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{ It is easily seen that dy,(v; к)/дк << 0. Hence, from (5.43), the limiting power 
function is strictly increasing in A, and the test is unbiassed. In practice, we should, 
as a rule, require only that 


РҚА; "Ета-ь-в; af (ngg—r—8)) = fs вав) 


where A' and f" are given positive quantities and f" will, as a гше, be small. The latter 
condition can be met by an appropriate unique choice of the scale constant 1// for 
every fixed лу, and analogously to the previous examples the choice of 7% and k should 
be such as to minimise Ил around the most likely value of ø. 


We remark that Stein (1945) gives an integral form* for the power function 
which may be shown to be equivalent to the more convenient series expansion in 
(5.46). 


The statements made in connexion with the limiting value of the power 
function for the two-stage test of Student’s hypothesis may be, in essentials, dupli- 
cated here. . The test is slightly better than is expressed by (5.47), in the senso that 
for all finite o the true power function, which by (5.39) and (5.42) is given by 


1—/(А; Раадт; а](00—"—8)30) 


" r g f 
i 1-% Pulo) у, (nm rtt. 75 J2 Ea). ss (5.49) 


oN gii 


satisfies the inequalities 
BAS РР ак mp 17850) > BAAS tF qos 177-8), 0 & А SN, ) 


< #*(3; "Инди в/(тЯ-—"—8)), АА y fl) 


where A" >A", while for values of A in the range (A^, A") the power function is 
tive to variations in c. The inequalities in (5,50) derive from the 


more or less insensi р 
monotonic increasing character of ури ули), and therefore of 0%, іп ш when A is 


small, (Q* is defined in (4.11). Thus, for A = 0, it has been established immediately 
after (5.37) that 


p*(0; rf. (ngg—7—58)). . 


nod—r—5; а! 


and general considerations of continuity show that y, (pu;s/Au) must be increasing 


in u near w= 0. On the other hand, V, (pusr/Au) is the total probability mass, 
ormal distribution with zero means and unit variances, in & 


BOs TF, Vus ca alm 77050) > 


under a spherical n 


3 v И, 
*Stein's integral is not the integral we have used, viz., 1— V ури; /хи) du, to obtain the 
the two integrals must be equivalent. 
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hypersphere with centre at 4/Au (which may for convenience be taken to lie along one 
of the coordinate axes) and radius ./pu. Hence, it is clear that an infinitesimally 
small increase in и will affect the location of the centre (which moves away from the 
centre of the distribution) more strongly than the length of radius, when A is 
sufficiently large. А net decrease of the probability mass accordingly results, i.e., 
дү (ри; VAu)/du < 0 for sufficiently large A. This property may also be proved 
analytically (if rather tediously). 


(C) Joint one-sided confidence intervals for a finite set of means. Given 
the populations as in 5(B), with sampling carried out in two stages also as in 5(B), 
confidence regions for the vector ш = (д, /ә...., /t;) of the form (Ж-а, oo) (i.e. infinitely 
extended orthotopes) with confidence coefficient 1—g, where a = (dy, dg, ..., а;) and 
а are predetermined may be obtained analogously to the one-sided confidence intervals 
for the case of a single population (though with greater computational difficulties 
because of the lack of tables for the distribution of the ensuing statistics). 


Referring to (4.3) and (4.11), the functions Q,, and Q* for the region R defined 


by 
В:Х < ра (551) 
: кеа Lol d A eré -% Д — Г 4 5.52 
are given by Qn = (27) d 1 Ё 5a Me II Ф ( г ул ) ... (5.52) 
and 0* = Г { Il Ф (асу) } dF, (и), i ... (5.53) 
0 1 


with vp = %ў—т—8 (as before). Consequently, уә and / must satisfy 
© r А 
осту) } ars (и) = 1-а. 2. (6.54) 
0 1 


We note that Q* is increasing in v, so that the confidence coefficient is certainly greater 
than 1—a. 


Since the a; are arbitrary, equation (5.53) gives the joint distribution of the 
2,-ш(4-- 1, 2, ...,r). Тһе joint distribution of the scaled random variables Zi, with 
zi = М 2;-ш) is a form of the generalized 1-distribution* (cf. (4.12) with (5.2)), 


*See Dunnett and Sobel (1954, 1955). 
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the corresponding density function of which is readily obtained from (5.53) by 
differentiation with respect to the a; as 


(5.55) 


duca (+ i | +) 


(wr ups m 


Joint one-sided tests for the и; with power independent of the variance may 
clearly be obtained in the same manner. Such tests are conservative in the sense 
discussed previously. 


6. FURTHER DISCUSSION 


(i) Suppose c is regarded as having some prior distribution function K(c). 
Тһеп 


Рг(““В|и, К} = | Р(Е| м, o)dK(c) > Рє(Е|ш), Р(В|ш,с) decreasing in о, 
< P*(R|p), P(R|#, с) increasing in б, ... (6.1) 


where P(R| pm, с) and.P*(R |н) are defined in (4.1) and (4.10), respectively. This means 
that the previous test procedures remain valid even ifc is not assumed to be constant 
but has some Bayesian distribution associated with it, provided P(R |, с) is monotone 
ing. The tests are still conservative in character if P*(R|p) is given as the power 


function. 


If, further, both ш and c аге regarded as having a joint prior distribution 
K(u, с) and P(R|p, e) is independent of м, Р(В| и, 7) = P(R\c), then 


Pr{teR|K} = i Т P(2 |o) Ku, с) > P*(R), Р(В |с) decreasinginag — .. (6.2) 
-шә 0 
where P*(R) = D. Р(Е|о). | 22 7(6.8) 


ation procedures for the д, or of functions of the 
| by means of confidence regions likewise remain valid when и and c have prior 
distributions, provided Р(В|6) is decreasing іп о. The estimation procedures are 
still conservative in character if P*(E) is given as the confidence coefficient. 4 


(ii) Тһе appropriate probabilities for the test and estimation procedures 
have been obtained previously by letting 07-220, and the conservative character of the 
procedures deduced from considerations of monotonicity with respect to c of the 


corresponding (conditional) probabilities for fixed total sample size. In practice, 


This implies that the previous estim 
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however, one would almost аһғаув Бе in a position to set an upper bound to с, say 
c" (this in itself implies a weak specification for the prior distribution of c, cf. (i), 
and, therefore, to improve on the ‘limiting’ power or confidence coefficient attained 
asymptotically as с-эоо. Ав an example, the confidence coefficient for the case of a 
single mean and symmetrical confidence intervals of width 24 would be stated as 
at least 


бе?) = $ рас" 20 (- ка 1) 


(see (5.22)) rather than at least 1—a.* 


(ii) Two-stage sampling procedures of the type discussed in this paper have 
been criticised in the past (Ruben, 1950; Lindley, 1958; Savage, 1959) on the general 
grounds of inefficiency, inasmuch as the observations from the second-stage sample 
are not used to supply information about. However, the loss in information can to 
some extent be recovered. As an example, consider again the case of symmetrical 
confidence estimation of a single mean. The mere conservative assertion that the 
confidence coefficient is greater than 1—a, though suitable for circumstances where 
the loss function is of such a character as to make the production of intervals with a 
covering probability for w less than 1—a unpleasant or undesirable or dangerous is 
nevertheless wasteful of information inasmuch as an infinitely high value of is assumed. 
This is merely another (though suggestive) aspect of the above criticism that not all 
_ the observations are used in assessing the variability of the population. Nevertheless, 
' in principle there is nothing to prevent the statistician using all the observations to 
estimate о? and thereby obtaining a better notion of the covering probability O(c). 
Thus, if з is the estimate of ø based on all the observations, calculated in the usual way, 
one could rather loosely say that the true covering probability is about O(s), and 
а somewhat more comprehensive, if rather imprecise, assertion could be obtained by 
Setting an upper bound, in a statistical sense, for о based on s. For instance, having 
computed s one could then proceed to compute s' = 841.48 (standard error of s), 
the standard error of s being derived approximately by regarding n as fixed and so com- 
puted by the usual large sample formula. То illustrate by a numerical example, suppose 
the O(s’) = 0.90 and x = 0.05. Since the value s' corresponds to the ordinate in the 
approximate distribution of s which cuts off a probability of 0.10 to the right, one 
could then assert that in addition to the true covering probability being with absolute 
certainty greater than 0.95 the odds are 9 to 1 that the true covering probability is 
not less than 0.90. In brief, there is no real reason why the data should be ‘reduced’ 
io the extent that only a single number, viz. the confidence coefficient, is 
supplied as a measure of reliability of the estimate of the mean. Reduction or sum- 
marisation of data may frequently be unrealistic inasmuch as it artificially narrows 


*We assume here that C (ø) is strictly decreasing in о (see footnote оп р. 7), and numerical work 
Seems to confirm this assumption. 
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the interpretative scope of a body of data. In our case, it appears to be more realistic 
to supply: an ancillary statement of the type just mentioned which shall incorporate 
the additional information relevant to с suppliéd by the additional observations іп 
the second sample, or perhaps better still one could supply a graph or table, and ask 


any interested party to draw. his own conclusions or make his own decisions contingent 
on any particular sample. ? 


We may approach the matter from yet another angle. It has been shown 
previously? that the efficiency of the procedure discussed here in relation to the optimum 
fixed sample size procedure is (too; 42-1: а/2)° when о? is large. The efficiency is 
therefore quite high for large с even if n, is only moderately large. On the other 
hand, if ø is small s is also likely to be small and, concomitantly, C(s) is likely to be 
large (i.e. near.to 1); and this will be so even if 82, and consequently n, are abnormally 
large, the high value of C(s) thus to some extent compensating for the large amount of 
sampling. 


We emphasize that this discussion is somewhat imprecise (better methods 
of recovering the lost information about с may, and almost certainly do, exist), but 
this fault appears to be part of the price that one must be prepared to pay for having 
to meet the rather difficult (and perhaps artificial) objective of deriving estimation 
intervals of fixed length and being at the same time restricted to only two-stage rather 
than fully sequential sampling. The situation would naturally be different if ib were 
desired to control the width of the confidence intervals, again of preassigned confidence 
coefficient, not absolutely but only stochastically, e.g. by imposing some such condition 
that the length of an interval exceeding a specified value be not gerater than some 
specified (small) upper bound. In that situation, one would not need to have recourse - 

. to any such circuitous manner of recovering information about 0? as described above, | 
since the predetermined confidence coefficient: objective together with the ancillary 
objective of the stochastic control of the confidence intervals, between them do not 
impose such drastic requirements on the reduction of data as does a mere specification 
of confidence coefficient and length of the confidence intervals. One could then use 
all the observations for the estimation of о? more directly. 
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TESTING FOR THE SINGLE OUTLIER ІМ А 
REGRESSION MODEL 


By К. 8. SNIKANTAN* 
Indian Statistical Institute 


SUMMARY. The general problem of testing a regression model against an alternative hypothesis 
of 2 single outlier is treated іп this paper. Certain test criteria аге developed and their ‘nominal percentage 
points’, which control the significance of the tests at or below prescribed levels, are obtained. 


INTRODUCTION 


An experimenter may be confronted with the hypothesis that the observations 
he has obtained are independent and normal with a linear regression on a known 
set of variables. Не may contemplate an alternative in which one or а specified 
number of observations, though he does not know precisely which, can deviate from the 
assumed regression or can have increased variability. Here we have to test the hypo- 
thesis of the given regression against the alternative of a specified number of ‘outliers’. 
The test should be formulated in such a manner that it is ‘best’ in detecting any of the 
alternatives when they happen to be true and, at the same time, controls the error 
of rejecting the hypothesis, when true, at a preassigned level of probability. This is 
the problem of testing for outliers. 

This problem differs from the analysis of variance. In the latter the regres- 
sion model is tested against all possible alternatives of shift in the means of the obser- 
vations. But in the outlier problem the alternatives are restricted to the deviation 


in the mean of one or a specified number of any of the observations from the given 


type of regression. 
Tt is seen, therefore, that the outlier test is appropriate to a situation in which 
we possess the additional information (over and above that required for the analysis 
ber of any of the observations can deviate, in 


of variance) that only a certain num te, i 
their mean, from the assumed regression. In some problems such information is 


forthcoming. For example, in factories, it might be found, from past experience, 
hines went out of alignment. 


that at a time only one or two mac: 
When the number of outliers contemplated in the alternative hypothesis is 
This paper deals with such tests. 


just one, the test is called a single outlier test. 
The problem of testing for a single outlier when the hypothesis is that all the 


observations come from identieal and independent normal populations with the same 
etely solved. The first significant 


but unknown mean and variance has been compl 
contribution towards: this problem was made by Pearson and Chandra Sekhar 
(1936) based on the work of Thompson (1935). They suggested certain test criteria 
and also obtained their percentage points in small samples. Later on percentage 
* Now with Central Statistical Organisation, New Delhi. 
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points for these criteria were obtained for larger sample sizes also by Grubbs (1950). 
Percentage points for a slightly different criterion involving an external estimate 
of variance were obtained by a different method by Nair (1948). 


The present paper deals with the more general problem of the single outlier 
in a regression model. This, no doubt, is a more difficult problem. In this case, the 
distributions of the test criteria, apart from being difficult to solve, should involve, 
when solved, as parameters, certain functions of the known variables on which the 
observations have a linear regression. In order to avoid this only the ‘nominal 
percentage points’ of these criteria have been obtained. The ‘nominal percentage 
points’ always control the error of the first kind at a level not exceeding the specified 
one; and under some conditions, which hold in restricted cases in small samples, these 
points control the error of the first kind actually at the specified level. ‘Nominal 
percentage points’ are used in this special sense throughout this paper. 

The method followed in this paper, for obtaining the nominal percentage 
points, would be seen to resemble in some respects that adopted by Pearson and 
Chandra Sekhar (1936) and in certain others that of Cochran (1941). Іп the next 
section we formulate the problem of the single outlier and the criteria for its 
detection. 


2. SINGLE OUTLIER IN A REGRESSION MODEL 


2.1. Preliminaries. Let = (01, У», ..., Yn) be n independently and normally 


distributed random variables with the same unknown variance о? and means to 
be specified later according to the alternative hypotheses considered. Let 
В = (В, В», ..., Bm) be т unknown parameters and X = {2;) be a matrix of mxn 


known constants of rank m where m <n. Further let 


Jg = Вал, Bst ++ В, ... (2.1) 
Tf E(y) = д, j = 1(1)n, then the least square estimate of f is given by 
b = y.X' (X.X'y3. (2:2) 


We shall represent the deviations of the observed values from their regression estimates 


by the vector 
© = (е Cas ..., €n); 


thus Е е= y—b.X. Е-е (229) 
It could be shown that the dispersion matrix of eis Ac? where A is the nxn matrix 
(Au = I-I (XXL. ... (2.4) 

The error sum of squares will reduce to 
| 8: =e.” = у. д.у. : | ... (2.5) 
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We shall use the following statistics: i ЧА 


d; = е{А8%%, ... (2.0) 
t= di, е) 
Uj is dj if digo 

and =0 14. %<0. 02:8) 
lj—0 if dio 


and - if d<0. 9) 


It is well known from the general theory of least squares that d; is 
symmetrically distributed and that t; follows ihe Beta distribution with parameters 


($ (в-т-1//9). 
Consequently, for any 2 > 0 we have 

Prob (шщ > a) = Prob (di > уз) = 4-4 L(b(n—m—1)2) ... (2.10) 
where L(».0 - (] gx1— 572 |В(р,а) 2 (811) 


Similarly for 2 > 0, 
Prob (l; > а) = Prob (di 2 —v2) = 3—1 L(b(n—m-—1)2. ' .. (212) 


2.2. Test for а single specified outlier. When it is specified that the i-th 


observation is an outlier, we can set up 


à 


Е(у) = Ир jAi 


and Е(у) = mtd; (i specified and шз given by (2.1)), 


and express the null hypothesis Но and the alternative hypothesis Н in terms of д; 
as follows: 


Hy:0; = 0 


H 10; 8509. 
РЫ 
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Let b, be the upper 20029; point and k, the upper 100%% point of the distribution 
of &. It could be easily shown that against one sided alternatives, uniformly most 
powerful similar regions exist and, corresponding to the level of significance «, these 
are given by 

u>k, if &>0 ... (2.13) 
and > i 0-0. ... (2.14) 


Against both sided alternatives, the best unbiassed test, which controls the error of 
the first kind at the level a, is 


ti > ka 7 (2.15) 


2.3. Test for a single unspecified outlier. Let 


t = max (5, 4, ..., bn) = шах e2/A;,; y Ay’ 577 (216) 
t Gta 
и = MAX (и, Ug,..., Un) = max е; |е | Аду Лу’ 2 (2217) 
i eure 
and l = max (l, lg, ..., Һ) = шах—ее [Aii y Ау” ... (2.18) 
i Ке ДА 


where A's are defined by (2.4) and e's by (2.2) and (2.3). 


When the outlier is not specified, the results of Section 2.2 immediately suggest 
the use of one of the criteria 2, и or 1, these having the common property of being the 
maximum of the studentised squared deviations in some appropriate sense. Тһе 
criterion v should be used when the alternative hypothesis is that the expected value 
of the outlier exceeds that given by the regression model; 1 should be used against one 
sided alternatives in the other direction and t against both sided alternatives. "Though 
from intuitive considerations these tests.seem to be justifiable, their performance 
will not be examined in this paper. 


The problem investigated here is the evaluation of the percentage points of 
these criteria. This will be discussed in the next section. 


3. EVALUATION OF THE PERCENTAGE POINTS OF THE 
DISTRIBUTION OF A MAXIMUM 


3.1. Nominal percentage points. Let-(v, Va, ...,v,) be n random variables 
and further let v = max (01, Va, ...,v,). Also let 


P(V) = Prob (v >?) ss Qul) 

PAV) = È Prob (v; > Y) 2. (33) 

and РДУ) = i = Prob (о > V, o > V). 2. (33) 
t=2 ізі 


254. 


TESTING FOR THE SINGLE OUTLIER IN A REGRESSION MODEL 


ee we could derive from the probability laws for the joint occurrence of n events, 
na | 


P(V)—P4V) < БУ) < РУ). ... (3.4) 
If V, is the exact upper 100%% point of the distribution of v, then 

P(V, = о. ... (8.5) 

The quantity V, defined by Р(Ў,) = ... (8.6) 

will be called the nominal upper 100@%, point of the distribution of v. From (3.4) 
it follows that у : 

PV) < а. E293) 

Thus, in testing any alternative hypothesis against the upper tail determined by the 

nominal percentage point V;, the actual level of significance cannot exceed æ. Further 

if P,(V,) happens to be zero then the nominal percentage point coincides with the 
actual one; a sufficient condition for this is: 

Prob (v; > Vi, Yy > Vi) = 0 for all $52). ‘ meet (8,8) 


For the test criteria developed in Section 2.3, the actual percentage points 
are.difficult to evaluate and would, in general, depend on certain parametric functions 
of the unknown variables (278) оп which the observations have a linear regression. 
In order to avoid this only nominal percentage points will be obtained in this paper. 
However, for special types of regression functions, it will be shown that for samples of 
sizes below a specified number, these nominal percentage points turn out to be the 
actual ones. For this purpose we require the two lemmas proved below. 

3.9. Two useful lemmas. Let Z,, Za and W be three real valued random 
variables such that ў 


° Prob (W < 0/2,2) = 0 for all Z, and Za (3.9) 


Let p be a real number where |p| < 1 and let 
Q = р) РИ 


Further let be any given positive number. Then we 


(3.10) 


and Ё; = 2110 («= 1, 2). 
have the following. 
Lemma 1: Prob (R; > h, №2 М-0 if 2 >14р and 

Lemma 2: Prob(|E,| > һ|В„| 2-90 if 29? 2 1-4 |pl- 
Proof: Obviously, 
U = Ri 2p, Ry +B} < 1—03 by (3.9) and (3.10). 
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But U = (R,— R5?--2R,R,(1—p). 
Hence when Е, >h and R, >h, 
> 2h(1—p) > 1—p? if 2 > 14р. 
Therefore we have Lemma 1. 
Again, ^ 0>(|8%-2|р1|818--|8%-(|8|-|В,)--2|8,||(1-|р|). 
Thus when |R,| >A and |В,| >h, 
U >?>мі-|р|)>1-р if 2%%%»1-+-|р| 
and Lemma 2 follows. | 


3.3. Evaluation of nominal percentage points of и and 1. Since и = max 
(Uy, Ug, ..., Up) it follows from (3.6) and (2.10) that the nominal upper 100%% point of 
. u is given by the following equation in 2: 


(n/2) . 1—1,(4, (n—m—1)/2)] = о. 77 (841) 


Tt also follows from (3.8) that a sufficient condition for this nominal percentage point 
to coincide with the actual one is that 


Prob (w; > x,u 2 х) = 0 for all i z j, 
that is, Prob (d; > Vx, dj > Vx) = 0. 
By an appropriate real linear transformation from 
(01, Уз, Yn) % (2,22 524), 
4; and d; could be represented as 
d;=2/S and 4,-2,/8 
where 5% = {(@—2р „угу 23--28)101—0?,))--28-4-24--... +28 - m ... (3.12) 


with pj; = Ар; Qu AQ (A's being defined by (2.4) and Prob (S < 0/z,2,) = 0 
forallz,andz, Applying Lemma 1 we get the following necessary condition for 
the nominal percentage point 2 to coincide with the actual опе: 

> tps 1--4;; 05; др) ... (8.13) 
(Аз being defined by (2.4)). 


The nominal percentage points of 1 turn out to be the same as those of w in 
virtue of (2.12); (3.13) becomes a sufficient condition for m coincidence of the 
nominal and actual percentage points of 1. 
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3.4. Evaluation of the nominal percentage points of t. Since t= max (ty, ts... tn) 
vi t; follows the B distribution with parameters (3, (и—т—1)/2) (Section 2.1), 
it is seen from (3.6) that the nominal upper 100%% point of the distribution of t 
is given by the equation in 2: 


n{1—,(4, (n—m—1)/2)] = a. ... (8.14) 
A sufficient condition for z to be the actual percentage point is that 
Prob (ti >x, t >a) = 0 forall i 4); 
that is, Prob (|4| > ух, |4| >/2)-0. 


Proceeding as in Section 3.3 and using Lemma 2, we find that а sufficient condition 
for this is 
2x > 14 [pi] = 1+ 1А, АА 7 ... (3.15) 


where A's are defined by (2.4). 


4, SCOPE OF THE TABLES OF NOMINAL PERCENTAGE POINTS 


Tables of the nominal upper 5% and 1% points of the distribution of the criteria 
u(or 1) and t are presented in Tables 1 to 3 for sample sizes up to n = 20 and regression 
variables m = 1,2 and 3. These were computed by solving equations (3.11) and 
(3.14) for æ = 0.05 and 0.01. For this purpose Karl Pearson's Incomplete Beta 
Function Tables were used and Newton's divided difference formula was applied to 
carry out inverse interpolation of the second order. The results are expected to be 
correct to within one digit in the fourth decimal place. 


5. SPECIAL REGRESSIONS : COINCIDENCE OF NOMINAL AND 
ACTUAL PERCENTAGE POINTS 
. 5.1. Regression on one variable (т = 1). For the simple case Bly) = В, 
j = 1(1)n, the test criteria u and | reduce to those formulated by Pearson and Chandra 
Sekhar (1936) and Grubbs (1950). We have, in fact, 


= 1—81] (Grubbs) ; 5. 5 
= {16°} (п—1) (Pearson and Chandra Sekhar) ... (5.2) 
апа 1 = 1—S}/S? (Grubbs) ... (5.3) 
= (ТӘ) (п—1) (Pearson and Chandra Sekhar). ... (5.4) 


al percentage points of u and 1 agreed to all four places 


Tt was found that the nomini 
Grubbs. 


of decimals with the actual ones obtained by 
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Since, in this case, 
Pij = (—1)|%—1) forall 4-2), 


the nominal 5% points of и (or Ї) are actual up to sample size 14 and the 1% points 
up to 19. The nominal 5% points of # are up to sample size 13 and 1% points 
up to 18, 


5.2. Regression on two variables (т = 2). An important ОЧ regression is 
E(yj) = £f), j= ЦИ. ... (5.5) 

In this case we get after simplification, 
А-1-%ю,) КӨЗІ (0:8) 


where 2, = ((n*—1)4-12(;i—[n4-1]/2)(j —[n-4-1]/2))/n(n*—1). "Therefore, max р;,; = 
2/(n—1). Thus for this regression the nominal 5% points of u (or 1) are actual up to 
sample size 10, and the 1% points are actual up to 15. Again, 


max |p| = 4.(n*—2n--13)-1. 


Therefore, the nominal 5% points of ¢ are actual up to sample size 9 and the 1% points 
up to 14. 


5.3. Regression on three variables (m = 3). The specialregression considered is 
Е(у)) = В+ sin (2л)/4)--ҙ cos (2nj[A) ... (5.7) 


where A is а known positive integer and j = 1(1)n. 
When n is а multiple of A, we have 


А = I—(2,,) 1. (5.8) 
where ар) = {1+2 cos [27(1—j)/A])[n. 
Thus Pij = —( 4-2 eos |2л(4-))/20)(ө-3) < 1/(n—3). 7127: (5.9) 


So the nominal 5% points of u(or 2) are actual up to sample size 13 and the 1% points 
up to 18 for this regression. 
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NOMINAL UPPER PERCENTAGE POINTS IN UNITS OF 107: 


TABLE 1* 
Regression : H(yj) = Вай» ў = К)» 


TABLE 2 
Е(у;) = В21,3-8222,), j = 1(1)» 


TABLE 3 
E(yj) = Взял В, 
48318,7 = 1(1)» 


worl - wor 
sample + sample le 
size 596 1% 5% 1% size 5% 1% 5% 1% size 5% 1% 
(1) (2) (3) (4) (5) а) (2) (8) (4) (5) (1) (2) (8) 
3 9973 9998 9993 10000 .9985 .9999 .9996 10000 5 .9993 10 
4 9506 9900 9752 9950 9604 9920 9801 9960 ы 09 
5 8730 _ 9558 9192 9721 
6 7968 9072 8547 9340 6 8872 9608 9283 9753 6 9669 9933 
7 7304 8553 7934 8897 i 8114 9140 8652 9389 7 8070 9645 
8 6739 8052 17384 8446 8 7438 8626 8038 8954 8 8981 9195 
9 6258 7589 6899 8011 9 6858 8125 7481** 8504 9 7551 8688 
10 5846 7169 6474 7606 10 6363** 7660 6987 8069 10 6961 8188 
11 5489 6789 6099 7233 11 5938 7233 6553 17661 11 6455 7720 
12 5178 0446 5768 6890 12 5570 6848 6170 7284 12 6020 7201 
13 4903 6136  5472** 6576 13 5249 6500 5831 6938 13 5644**. 6901 
14 4660 5855 5208 6289 14 4967 6185 5530 6621%% | 14 5815 6549 
15 4441 5599 4970 6026 15 4716 5900** 5259 6330 15 5026 6231 
16 4945 5366 4754 5784 16 4493 5640 5011 6064. 16 4760 5942 
17 4067. 5152 4558 5561 17 4291 5404 4197 5820 17 4540 5679 
18 8005 4956 4379 5356** 18 4109 5187 4597 5594 18 4334 5420%% 
19 8757 4775 4215 5165 19 3043 4988 4416 5386 19 4149 5220 
20 3621 4607 4063 4989 20 3792 4805 4248 5094 20 3979 5018. 
Е черта 
*Table 1 provides actual percentage points ofu and 2 for the regression Z(yj) = В (Section 6,1). 
**Percentage points are actual up to and including this sample size for the regression 
Table 1: E(yj) = 8; Table 2: Elyj) = В+ Bs. j (Section 5.2); Table 3 : Еу) = В1++ В sin (2т/))) 
-+83 cos (2т М, = 1(1)", where ^ is а positive integer and n is a multiple of X (Section 5,3). 
Note; 1. ч,1 and tare given by equations (2.16), (2.17) and (2.18). 


2. Nominal percentage points of u and Г satisfy (3.11) and those of t satisfy (3.14). 

3. Condition for coincidence of the nominal and actual percentage points of и and 1 is given by 
(3.13) and for 2 by (8.15). , 

4. In virtue of (3.13) and (3.15) the nominal upper 100a percent points of и (or 1) are the same as 
the 200a percent points oft. Thus the nominal 1% and 5% points of u could be used 
as the 2% and 10% points of t and nominal 1% and 5% points of (could be used as 0,5% and 


2.5% points of u(or Й. 
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MINIMAL SUFFICIENT STATISTICS FOR THE BALANCED 
INCOMPLETE BLOCK DESIGN UNDER AN 
EISENHART MODEL II* 


By DAVID L. WEEKS 
and 
FRANKLIN A. GRAYBILL 
Oklahoma State University 


( SUMMARY. Тһе authors present in this paper aset of minimal sufficiont statistics for a 
General Balanced Incomplete Block model under the assumptions that the treatments and the errors 
are random variables, і.ө., Eisenhart Model 11. 


1. INTRODUCTION 


Sufficient statistics play a very important role in the theory of minimum 
variance unbiased estimation [(Rao, 1946); (Blackwell, 1947)]. If à minimal set of 
sufficient statistics can be found for a particular distribution, then the Rao-Blaokwell 
theorem states that minimum variance unbiased estimators must be an explicit funotion 
of this set. If the minimal set is also complete (Lehmann and Scheffé, 1950), then 
an unbiased estimator based on this complete set is the unique minimum variance 
unbiased estimator. In confidence interval estimation and tests of hypotheses mini- 
mal sufficient statistics play a similar role. 


The distribution of a set of minimal sufficient statistics can be considered 
as the canonical form of the distribution of the original observations since this set 
represents the maximum reduction possible without losing *information." ‘Therefore, 
it is quite useful to be able to exhibit a set of minimal sufficient statistics and their 
joint distribution. 

The purpose of this paper is to exhibit a set of minimal sufficient statistios for 
a general balanced incomplete block model under the assumptions that the blocks, 


the treatments, and the errors are random variables; i.e., Eisenhart's Model II 


(Eisenhart, 1947) 


2. DEFINITIONS AND ASSUMPTIONS 


A balanced incomplete block is defined as a design in which there are ¢ treat- 
block (where k < t) with each treatment 


ments and b blocks of Ё experimental units per 1 
replicated r times. The arrangement of treatments is such that every pair of treat- 
ments occurs together in exactly A blocks. 


* Research sponsored in part by National Science Foundation, Grant No. NSF-G-3970. 
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The model шау be written as a special case of the general two-way classifica- 
tion (Kempthorne, 1952), i.e., 
Узи = PB; HTH ет ЇЗ: -. (24) 
where $ = 1,2,...,5; j= 1, 2, ..., t; т = ту; where 
1 if treatment j occurs in block i. 
ү | 0 otherwise. 


Tt should be noted that only the Y;,,, in which m + 0 are observed. 
Equation (2.1) represents bk=n equations and these equations may be written 
in matrix form as 


Y = uX,--X;8-4-X,7--e ... (2.2) 


where the dimensions of the matrices in (2.2) are as follows: Y(nx 1); X4(nx 5b); g(b x 1); 
Х.тх%; v(£x 1); e(nx 1); p(LX 1); with X, an nx 1 vector with each element equal 
бо опе. (jj will be used to denote an sx 1 vector of ones and Jj an sx q matriz of ones). 
We will denote Х.Х, by М and the 4х4 identity matrix by 1. 

Under an Eisenhart Model II we shall assume the following distributional 
properties of the vectors B, v, and e. 

(а) В is distributed аѕ. ће multivariate normal, mean 0, covariance 
matrix o? Iz. 

(b) т is distributed as the multivariate normal, mean 0, covariance 
matrix c2 I). 

(c) e is distributed as the multivariate normal, mean 0, covariance 
matrix g?L,. 

(d) cov (B, т) = 0, cov (В, е) = 0, cov (т, е) = 0. 

(е) д is а scalar constant. 

Under these assumptions, the joint distribution of the elements of the Y is 
given by 


9(У; 8) = exp — &Y—4) VY —4) 395) 


1 
ayr 
where P= ИХ, @' = (и, 0°, 01, 03) 

У-(Х,Хі01--Х,Х,ө3--1,03). 


The following relationships hold for the matrix model i 
(а) ХІХ, = М, (9) HX; = J (в) ММ” --(ғ-А),--АЛ 
(b) XX, =r, (ө) JX, = rJ; (bh) (Xj—E3NXj)X, = ЛКИ, Л) 
O Қа-ы O ла-қ 6 оке | 
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8. DEVELOPMENT OF A ‘SET OF MINIMAL SUFFICIENT STATISTICS 


In this section we will define an orthogonal matrix P in such a way that by 
transforming Y through the transformation P, a set of sufficient statistics will be 
obtained. Let Р, be a set of 1—1 “Шоо. vectors such that 


P;NN'P, = D, 


where D; is diagonal with 1—1 of the characteristic roots of NN’ on the diagonal. 
The characteristic roots of NN’ are rk and (r—A) of multiplicities 1 and /—1, respec- 
tively. Let P3 be constructed such that 


rk 0 
Py NN’P; = | | ғ 
0 (r-JLa 


Let the first row of P$' be Б ji and P; be a set of t—1 | т which aro 


orthogonal to —— Hence 
g 2 jt 


seep 0: 
Am d Ps 


Then from the form of NN’ in the BIB, 
rk 0 


и" мм (ий) 


0 (—Л)1 
Р, ( ) і-1 


Let P; be a set of 5—1 orthogonal vectors such that 
PIN’NP, = 

where D, is diagonal with the characteristic roots of {№ other than rk on the diagonal. 

Since the non-zero characteristic roots of NN’ and N'N are equal and of the 


same multiplicity, partition P, into (Pa, P,,) such that 


Py, Р.) | (e 0 ] 
IN’ = N'NI P. $ = 
Кы NP, | р | ( E Р { j 0 (>A) 


Tt сап be shown that P5; can be expressed 


where 0, denotes а 7X4 matrix of zeros. 
in terms of Py by the relation 


= ("ЛР 
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^. Let A'—(X,—E-!NX,) and consider A'A. The 4—1 m vectors, 
Р; also diagonalize АҒА lsince А’А = rl NN’. Hence 


at At 
PjA'AP, = 2 La. 


Now let X = (X, X, X,). Since XX’ is symmetric, there exists an ortho- 
gonal matrix Q such that Q’KX’Q = D where D is diagonal. The rank of XX' for 
the balanced incomplete block design is b-+-t—1. Therefore, if we partition Q into 
(Q,, P4), where О, and Р, are of dimension 6+-t—1xn and (n—b—t+1)xn respec- 
tively, we may write 


Qi Q; 4 ; 
‚| XX(Q, Py) = (X,X;--X,X; --X,X;(Q,, P4) 
P; Р; 


ку 


Hence PX X;--X,X; -X,X))P, = 0. 


Since the matrices Х,Х,, X,, X; and Х,Х, are positive semi-definite, it follows that 
P, is a set of n—b—t+1 orthogonal vectors such that PjX, = 0, Р.Х, = 0 and 
Р,Х, = 0. 


We now define an orthogonal matrix P as follows: 


wj | 
ki? PAX; 
k-12 PX; 


Consider now 


g(Y;0) = exp—} (Y—g)'PP'V-3PP'(Y—g). 


1 
(any [У 
By first finding the form of P’VP and using the fact that 


(P'V3P) = (P'VP)-1 
964 


жаа 


MINIMAL SUFFICIENT STATISTICS FOR BIBD UNDER ЕІБЕХНАВТ MODEL ТІ 


we have the result as shown іп Table 1. 


TABLE 1. FORM OF P'V^P 


(02 --koi + т 03)-1 0 0 0 0 
0 (о? +-ko?)-1Ip- 0 : 20 0 

9 0 d} (02-3 M03) Tess —[Ee-2M(r— 08 
471 030-1 0 

0 0 =[l-ant(r—a) dy? eua Гозо 


(r—a)o2]dp ts 0 


0 0 0 97? In-b-t1 
d, = o*--ko203 тд? оў №01 05 


The vector P'(Y — 2) can be shown to be equal to the following : 


пу... —H) 
kP, XY 
P(Y—g)— | k3P5X;Y 


(y ras 


РҮ 


where 0... = E px. 
ІҒ we let q = (Y—g)'PP'V3PP'(Y— р), 
we have 
q = (3-3 ro)” ny... —py--[«(o*4- ko?) YX, Pa Po XY + 
мозор Y X PaP’ XY ko Y P PY + (5) tone + 


er otii! Y'AP,P;A Y — 20-4 —2A) M]; Y'X,PoPSA'Y. / 
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| From the form of 4 we may define six statistics as follows: 
dhe 
вз = k-3Y'X,P4P5X;Y (not defined if b= t) 
в = k3Y'X, PP; X;Y 
в, = k-X(r—AJY'X,P,P;A'Y 


.-( B )Y'AP;BA Y 


s; = Y'P,P;Y КЕШ (3:1) 


These six statistics are, by definition, sufficient for the parameters р, 07, сї, 
and оў. 


We shall now prove that this set of sufficient statistics is minimal for g(Y; 0). 
`9(У;0) may be written in the form 


k 
g(¥; 0) = P0) QOY) 3300 


A necessary and sufficient condition for the set of sufficient statistics w(Y) to be mini- 
mal for g(Y; Ө) is that there exist no non-zero constants а, a5, ..., ар, c such that 


k 
Хау (0) = c. ; ШУ (8:2) 
ї=1 і 


. Thus it is enough to prove that for the following seven functions : 


v; = (0-0: +103) 

va = (0? kot) 

vs = [0*-- koi - Ци) ат 
va = [0?-- E Ato] d; 


"wp вай 00-007 
06 = 0—2 
= д. "mo Kuss eee ee (3.3) 


(3.2) is hot true for any 01, Mg, :..4ҙ and c except when all of them vanish (Rao, 1945). 


Tn (3.3) it is clear that р appears only in v, Since v, vs, ..., % are homoge- 
neous functions of o, 0; and v of degree —2; the constant c can only be zero. 
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Effecting the linear transformation т = 02; y = at-- kot; 2 = g3--ko?-I-ro$ 
the functions in (3.3) become Е 


v, = ay [e aye] 2” 
ие eo] D7 
n aye [ir СА ey) | 07 
va = aye +% e=) ] 2“ 


05 = — 22002 M D 


% = у et x и—Эе—и] р" 
1d Le TUA 
where D = xyz [54 7k (у—®)(@—%/)]- 


Observe that the term a*y* appears only in v, 2%2 appears only in %, and 
72? appears only in vy. This being the case, Vi, % and о, are mutually linearly inde- 
pendent and linearly independent of vs, 04, and vy. Now observe that after removing 
the common factor zyz in vg, 2, and Vs, these are also linearly independent. thereby 
proving that (3.2) is not true unless ал, da; >>>» % and с vanish. This condition then 
implies that the set of sufficient; statistics defined in (3.1) is minimal. 


Tn Table 2, the relationship of these six statistios to the intra-block analysis 
of variance (Kempthorne, 1952), the expectation of the statistics, the pairwise inde- 
pendence, and the distribution of each is given. Concerning Table 2, 


(a) the relationship for s, holds only if b >t. 8 is not defined if b=t 
and in which case there are but five statistics in а minimal set of sufficient statistics; 


(b) xXp) denotes the central chi-square distribution with p degrees of freedom; 

(c) the р; are the non-zero characteristic roots of А441) where 
Ay = kAX,N'OG- E333); 

(а) В. в the i-th block total, T; is the total of all blocks containing treatment 


j, V; is the j-th treatment total and Q; = (V,—E-T,. (Note also that Q; has no rela- 
tionship to the orthogonal matrix Q, mentioned previously). 
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Then we can define a random variable X taking non-negative integral values 
in T with probabilities 
а,0% 
f0) 
and call this distribution analogously a generalized power series distribution (gpsd). 
It may be noted that gpsd reduces to a psd when T is the entire set of non-negative 
integers. The properties established by Noack (1950) and Khatri (1959) for psd can 
be easily deduced for gpsd by following the same lines. Further, it can be easily 
seen that proper choice of 7 and /(0) reduces the gpsd, in particular, to the Binomial, 
negative Binomial, Poisson and logarithmic series distributions and their truncated 
forms. Incidentally, it is obvious that truncated gpsd is itself a gpsd in its own 
right and hence the properties that hold for a gpsd continue to hold for its truncated 
forms. 


Р„= Prob {X = 2} = eT ... (1.3) 


Probiems of statistical inference associated with psd's do not seem to have 
been much investigated. Roy and Mitra (1957) have derived the uniformly minimum 
variance unbiased estimates in certain particular cases and have provided necessary 
tables for Poisson distribution truncated at zero. The author (1957) has shown that 
for gpsd (1.2), the maximum likelihood method and the method of moments give the 
same estimate of the parameter of the gpsd. ‘The likelihood equation and a method 
for solving it are derived for the problem of estimation. In this paper, we suggest 
what we call the, “Ratio Method" for estimation of the parameter of the gpsd and 
investigate its important properties and study certain applications. 


2. ESTIMATION BY THE RATIO METHOD FOR A G.P.S.D. 
[Range T finite and T = (c, c--1, ... c-.-k = d) with positive probabilities]. 
Тһе gpsd that we consider here is of the form : 


x Р; = Prob(X = а) ---% be (2.1 
Um tg E 
where _ жеТ = (с, c--1,... c+k = d), d finite 
d 
ХӨ) = X a, 0 à ... (2.2) 
Тас 
апа а, > 0 for zeT. 
Let ae ‘ 
9 (е) 22 aT ШЕ (2:5) 


=. 


with r being an integer such that z—reT. Then 
; ч v Ө 7 
X Фр, =0 E P, ... (24) 
Фай tur 
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where u and v are arbitrary with c--r Зи <v < d. From (2.4) we get the identity 


12 vr 
F= Х Р, | AR ... (2.5) 


which can be made use of in problems of estimation. In a sample of size N, if n, is 
the observed frequency for x, then since E(n,) = МР», the statistic 


v or 
X gon, | me ... (2.6) 
т-и =r 


may be taken as an estimate of 0" for admissible values of r = 1, 2, eto. Since % 
and v are arbitrary, the same method is applicable for estimation in truncated and 
censored gpsd’s also, provided that their range contains a subset of consecutive 
integers. We call these estimates “ratio estimates.” өз 
; It is interesting to note that Plackett (1953) and Moore (1952, 1954) applied 
this ratio method to the special cases of estimating 0 in truncated Binomial and: 
Poisson distributions. ‘The method which we call the ratio method is applicable 
not merely for estimating 0, but also for its integral powers and for any gpsd of this 
section, truncated or censored. i 
The ratio estimate is not generally unbiased or efficient, but is always easy 
to compute. In certain cases (see Section 3), however, unbiased estimates can be 
obtained by the ratio method. In other cases, such as those in this section, the bias 


is generally of the order y . It may be easily verified that no unbiased estimate for 
(and 0" in general) exists when the range of T is finite as in situations considered here. 


Consider the following ratio estimate of 0 for gpsd (2.1): 


лп ЕТТЕ; 
E % | ) 
=> (= „8 
where A EN гҮ | (9.8) 
-1 
and ъ= Б ты ... (2.9) 
т-с 
(4-1 < 
Then, writing Ey) =N 2Р- (1-Р) = NP, say, ... (2.10) 
where J Р = 1—Р,, { ... (2.11) 
we have : Et) = NPO. . 2. (2:12) 
Let Еф) =o and ta— Elt) = 95. ... (2.13) 
ei оа оу аиа 
нЕ ус =8(1+у6) (ТУР 
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Since the deviations бї, ôf, are stochastically of order №, we get on expansion 


, o _ д _ (080) , (0 2.14 
j pop eye ҮР РФ Е үй» | TS bag 


) neglecting terms of order higher than n Thus, to this order of approximation, 
"ub Й E(0t) Е(6і,)(64,) 
| E(0) =| та — ipu! ]. ... (2.15) 
Now a little computation gives 
E(ôt,)? = УР(1-Р) 2. (2.10) 
and (06) = МӨР(1—Р)—Р,_у]. ... (2.17) 
| à 1/ ӨР, : 
Thus Е(@') = atl p 5) 2. (2.18) 
from which we get the magnitude of the bias 6’, to order P 
p sb P^ B(0' 
ме) = ( р.) = өр, утар = £ ‚вау. 1. (210) 
The variance of 0” correct to terms of order X is 
‚ 1 
Var(9*) = тр» [A (6t,)?+-0°H(6t,)?— 26 (01, (81,)]. ... (2.20) 
Now H(6t,)? = N(D—P:5?) . 42: (2.21) 
where E СЕА 
2-% (5) P, 2. (2.22) 
Thus, to d 
li | us order м 
r 1 
Var (07) = ә [Р—РӨз--96Р,_ у], ... (2.23) 


3. UNBIASED ESTIMATION BY THE RATIO METHOD FOR A G.P.S.D. 
[Range T' infinite and T — (c; c--1, ...) with positive probabilities]. 


Uniformly minimum variance unbiased estimation for psd's has been consi- 


dered by Roy and Mitra (1957).. Tate and Goen (1958) have considered the same for 
truncated Poisson, 
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It is easy to demonstrate. that the ratio method discussed in Section 2 gives 
the unique unbiased estimate of 0, linear in frequencies, for a gpsd with range T' i civ dd 
and T = (с, c4-1,...) with positive probabilities. For, consider the gpsd 1 


jugi Mio UMOR 
‚ = Prob (X = 2) FO) #@=c,c+1,... 3 NISL), 
where 70) = $ а,0% Ў ... (8.2) 
тес 
and а,>0 for all «=c,c+l,.... 


Now, if in a sample of size N from gpsd (3.1), the frequency of 2 is п, and we 
want an unbiased estimate for 0 of the type linear in n,, we should be able to demons- 


‘trate the existence of a function of т, (с), such that, denoting the corresponding esti- 


mate 


= X цап, VENGA) 
we must have Æ(Õ) = 0 for all 0 in the parameter space of (3.1). That is 
о о 
УХ Ңађа,0° = Ха, бт. 
ес Ld 


Since this is an identity in 0, equating coefficients of corresponding powers of 


0, we get 
( 9 for x =c 


| 
Қт>) = 
|) Late) PERNES DATEI UA 


а 


) 
\ 


Thus, the unique unbiased estimate of 0 linear іп the frequencies comes out 
to be the ratio estimate 0’. The exact variance of this estimate is 


0%6') = xl z (“= ) Р,—@ |. ... (34) 


жесі а= 


An unbiased estimate of 0(0'). is 


5 (5 E п, - pan. 2 e: (8.5) 


[2 тесі 
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the proof of which is almost immediate once one recognizes that 0” is the mean of 
N independent identically distributed random variables У; with probability" distri- 
bution given by (for 2 = 1, 2, ..., М) 

Prob (Y,— 0} = P, 


and Prob | ү, = 4) = РИ о = == С-ГІ, ср, . 
One can compare c*(0") with the asymptotic variance Var (8) of the maximum likeli- 
hood estimate of 0 and the efficiency of the ratio estimate 0” can be computed. 


Lastly, one may establish that 


ol 5 а, 

cies о, А ene, 10 (3.0 

N EN а, ) fa ( ) 

is the only unbiased estimate of 0” (r an integer) which is a linear funotion of the 
frequencies. 


4. ESTIMATION BY THE RATIO METHOD FOR SINGLY TRUNCATED 
BINOMIAL DISTRIBUTION 


Fisher (1936) and Haldane (1932, 1938) discussed uses of the truncated 
binomial distribution. For instánce, in problems of human genetics, in estimating 
the proportion of albino children produced by couples capable of producing albinos, 
sampling has necessarily to be restricted to families having at least one albino child. 
Finney (1949) has cited some more applications. Fisher and Haldane derived tho 
maximum likelihood procedure to estimate the parameter 7. Patil (1959) gave a 
direct method to obtain the maximum likelihood estimate. Moore (1954) suggested 
a simple “ratio-estimate” based on an identity between binomial probabilities, For 
a slightly different problem, where, in a sample from a complete binomial distribution, 
the frequencies in some lowest classes are missing, Rider (1955) suggested a method of 
estimation, which uses first two moments of the complete binomial and leads to a 
linear equation. 


The probability law of the binomial distribution truncated at с on the left 
can be written as 


b*(v, п, n) = ((в*с, 7, п) (2) ОСА 
вого В\е, п, п) = X ( м 7T'—my- ... (4.2) 
* p А ас \ 2 
The first two moments about the origin of (4.1), then, are 
H* = p*(c,z, n) = пп. B*(c—1, т, N—1)/B*(c, т, n) 2.44.3) 
and т = тс, т, n) = p*(c, т, n) {1+-n*(e—1, т, n=} —— .. (44) 


(The case of truncation to the right сап be dealt with in а similar way by replacing 
п by 1—7 and the truncation point c by n—c, 
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. In this case а, у |а. = x|(n—a-+1) and since 0 = z/(1—7), (see Patil, 1959), 
we have the following “ratio-estimate” for л: 


PRAS и. 


vn. 


ms [s 


n-i 
where i= ) andi, = X m, 
g-e 
To investigate the efficiency of 7’ given by (4.5) its asymptotic variance can be written 
down as: 


n 0—7Y пер | 
Var (т) = РЫ [(l1—m)?D—Pr®+-2n? . Pp] ...(4.6) 
where T "Я b*(a, т, n) 
тас 


DENS (=i) b*(r, m, п) 


тоу 1т--т--1 
and P, = b*(n—1 . m, n) . 
Also the asymptotic variance of the maximum likelihood estimate л, 
(Patil, 1959), is given by 
SET. п(1—п)? 
Var (т) NE 
where из is the variance of (4.1). 
Therefore the asymptotic efficiency of т” takes the form : 
D. EA 
== іе) D—P-42P, E „(47 
mnie) = ( (nore «n 
The special cases of some importance in genetics are с = 1 and т = 1/4, 1/2, 
or 3/4. The efficiency of the Ratio-Estimate (R) relative to the Maximum. Likelihood 
Estimate (ML) in these cases is tabulated and shown in Table 1. 


TABLE 1. ASYMPTOTIC EFFICIENCY ОЕ В КОВс-1 
ди 


n T= 14 1/2 3/4 
3 924 1875 1875 
4 .909 1769 2779 
5 1919 115 :664 
go 1988 :694 563 
7 947 ‚693 523 
8 952 .705 481 
9 .956 .703 ‚435 
‚®.1@= 8900.76 388 
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| Examination of the above table shows that the efficiency of R in case of 
п=1 [4 and т = 1/2 decreases іп the beginning with n, reaches a minimum and then 
increases with increasing values of n. For л = 3/4, however, the efficiency decreases 
throughout for » = 3(1)10. 

Following Section 2, one gets, to order 1 ІХ, the amount of bias of т” (В) аз 
follows: 


Қт’) = О")? рлар (n—2n2)P, (1 np] = дш 


NP? 
The table below gives B(z') for с = 1 and 7 = 1/4, 1/2 and 3/4. 


TABLE 2. N (AMOUNT OF BIAS TO ORDER 1/N) OF R 
ere 


n т = 1/4 1/2 3/4 
3 = +1927 ‚1458 -0820 
4 .1227 .1307 .0833 
5 .0861 . ‚1184 .0893 
6 -0648 ‚1062 ‚0962 
7 -0510 +0943 +0931 
8 .0420 .0833 .0928 
9 .0355 .0786 -0915 
10 .0301 .0651 .0896 


ar oe Е ЦРТ 07 


Table 2 shows that R is ап underestimate of 7. А closer investigation, 
however, brings out that the bias to order 1/N is quite small for R. One may note 
that the maximum likelihood estimate also happens to be biased in this 
case (Patil, 1959). 

Illustrative example. Тһе detailed computation procedure of the ratio estimate 
discussed above will be illustrated with reference to K. Pearson’s data on albinism 


іп шап. The table below gives the number of families (n,), each of five children having 
exactly x children in the family, (x = 1, 2, 3, 4, 5). 


ВИС ВОИН 


number of albinos 


` in family (2) 1 2 3 4 5 
number of families 
(nz) 25 23 10 1 1 


Е оо CR 


If 7 is the probability for a child to be an albino, we may accept the truncated 
binomial model: 


(2) пу 
ап au “--1,2,..т 
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for the probability of x albinos in a family of n. Неге n = 5, and the problem is to 
estimate т on the basis of the data given in the table above. 


—1 " 
Here, & = S п, = 59 and f, can be computed from the following table : 
ші 


2 
М "тат! 5а 
2 5 23 
3 1 10 
4 2 1 
5 5 1 
Th i= > (_* _\ n, = 28.50 
us LH (acer) Не К, 
28.50 
The rati і і і и 04257. 
The ratio estimate is obtained as 7 28.50459 
To compute the variance of т”, we require З 
A ERNEUT MA Ey ВЕТА 


1—(1—(1—7))" 


p, М1) a. Р) = 0.04408 
n-i л 


апа 
= (555) lt [eom (mon) 5") азам 
where ul (= пп ) = E = i )* (1—m)"-*/{1—(1—a)"} 


and is tabulated by Grab and Savage (1954). 

By linear interpolation from the table by Grab and Savage, 
E (= т, 1-4) = 0.33870 taking 7’ = 0.3257 as the estimate for 7 throughout. 
Then the variance of 7’ is estimated from the formula 


: Varr) Go ап) Рл 22. P, 4]-—-0:0019439—— — 


во that the standard error of 7’ is В.Е. (7) = ero ud Eo d. 
Incidentally, the maximum likelihood estimate 7 comes out (Patil, 1959) 


to be in this case, 7 = 0.3088 with S.E. (л) = 0431. 02502. 
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6. ESTIMATION BY RATIO METHOD FOR TRUNCATED POISSON DISTRIBUTION 


Problems of estimation in a truncated Poisson distribution with known 
truncation points have been discussed by various authors. The case of truncation on 
the left has been considered by David and Johnson (1948) who gave the maximum 
likelihood estimate, by Plackett (1953) who gave a simple and highly efficient ratio 
estimate, and by Rider (1953) who used first two moments. Truncation on the right 
has been discussed by Tippett (1932), Bliss (1948), and Moore (1952). Tippett 
derived the maximum likelihood solution; Bliss developed an approximation to it; 
and Moore suggested a simple ratio estimate. Uniformly minimum variance unbiased 
estimates have been obtained by Roy and Mitra (1957) and by Tate and Goen (1958). 
For both types of truncations, the author (1959) has provided neat and compact 
equations for estimation by the method of maximum likelihood. Не has also presented 
numerical tables and some suitable charts to facilitate the solution of these equations 
in certain special cases. In this section, we study the Ratio Method as applied to 
truncated Poisson distributions. 

The probability law of the singly truncated Poisson distribution with truncation 
point on the right at d can be written as: 


p*(e, и) = [P(d, wpe = в=0,1,5,...@ 12 (6.1) 
d z 

where Ра, и) = X ен”, ... (6.2) 
2=0 a! 


А In this case, а, ‚а, = x, and since 0 = д, ће ratio estimate for р takes the 
‘orm 


4 4-1 
д’ = Хаһ|У Ny 2.- (6.3) 


as first suggested by Moore (1954). The following table gives the asymptotic efficiency 
of д’ relative to ht for values of d — 5 with д = .25, .5(.5)2.5, and d= 10 
with д = .5(.5)4.5. 


1 TABLE 3. EFFICIENCY OF R 
———————— — 


ЖУ .25 .50 1.00 1.50 2.00 2.50 
Case (i) d = 5 
Eff. 999 990 979 967 951 .923 


ш 5 1.0 1.5 2.0 2.5 3.0 3.5 4 4.5 
Case (ii) d = 10 
Eff. 1.000 1.000 1.000 1.000 +999 +992 .981 .894 .817 


978 


ON GENERALIZED POWER SERIES DISTRIBUTIONS 
Thus, R seems to be highly efficient on the whole and its efficiency always 
exceeds 82 percent in situations considered in Table 3. 


The following table gives B(x’) for values of = 5 with ж = .25, .5(.5)2.5 and 
d = 10 with и = .5(.5)5. 


TABLE 4. N (AMOUNT OF BIAS TO ORDER 1/N) OF & 


уд 


ш 125 50 — 1.0 — 1.50 900 2.50 
Gas () 0 = Б i ER 
B(w’) .0008 ^ .0008 -0015  .0719 2197 .446l 
Case (ii) d = 10 
и ШЕТ оо оао а ЕО о 
Bw) 000 0000 003 0004 0092 0081 0231 0586 1068 1870 


Table 4 shows that, though over-estimate, R involves almost negligible bias. 


The probability law of the singly truncated Poisson distribution with trunca- 
tion point on the left аф c can be written ав: 


Ф%а,ш) = [P*(c, есені w = c, CF l... 0 ... (6.8) 
where Ре, =È № ... (6.9) 
тес zi 


In this саве, а, 1/4, = and since 0 = и, we have the following “ratio estimate" 
for р: 


ДУУ zu, |N 2. (610) 


тесі 


when с = 1, i.e., when only “zero” counts are truncated, the estimate takes the form 
suggested by Plackett (1953): К 


м = Ў, |А. 2. (6.11) 
тей 


The unique unbiased estimate of p, linear in the frequencies (ibid., Section 3) is provided 
in (6.10), The exact variance of this estimate is E 


> e 
ош) = xl E oP. | ... (612) 
and an unbiased estimate of c?(u') is 
{3 ап Мр" \ | МУ—1) А 4948) 
0l : 
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when с = 1, (6.12) reduces to 


ош) ES [e+42/(e*—1)] ... (6.14) 


first derived by Plackett (1953). Plackett computed also the efficiency of x’ in this 


special case. The following table gives the efficiencies of д’ relative to u as computed 
by Plackett. 


TABLE 5. EFFICIENCY OF В FOR c= 1, 


м 


ГА 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 
Eff. -9693 9559 9539 9586 9662 9743 9815 9872 


The author is thankful to Professor С. В. Rao and Dr. J. Roy for helpful dis- 
cussions at the Indian Statistical Institute and to the referee for helpful comments. 
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METHOD OF CONSTRUCTING TWO MUTUALLY ORTHOGONAL 
LATIN SQUARES OF ORDER 3n+1* 
By P. KESAVA MENON 
Ministry of Defence, New Delhi 


SUMMARY. Bose, Shrikhande and Parker (1960) have shown that a pair of mutually 
orthogonal latin squares of all orders other than 2 and 6 can be constructed. They have shown 
in particular how a pair of orthogonal latin squares of order 3n-+-1 where n is odd can be constructed 
with the help of an orthogonal array. 


Given below is a direct method of making two orthogonal squares of order 3n--1 
(п Æ 2, 6). 


Let A be the circulantal matrix 


2n 0 1 pe 2n—1 


Let В be the matrix whose first n-+1 diagonals (counted from the principal diagonal onwards 
from left to right) are respectively the first n-|-1 rows of A and whose remaining т diagonals 
have the constant elements 2n--1, 2n4-2, ... 3n respectively : 


0 2% 2n—1.. n42 n+l Qn+1 20-2... 3n—1 3n . 
3n 1 0 ... ntt n43 n42 Qn+1... 8%—2 8п-1 
3n—1 3n 2 ... n46 n-+5 n+4 7-9... 8%—8 3n—2 


2n—5 2—6 2—7 .. 94-9 m43  2n44 20-5 ... 21—3 2n—4 
2n—3 2n—4 9п-5 ... 0-1 2n42 An+3 94-4 ... 2n—12n—2 
2n—1 9п-2 213... т Qn+1 2042 2т--3 ... 3n 2% 


*Editorial Note: It may be seen that the author arrives at the same type of orthogonal squares 
as those constructed by Bose, Shrikhande and Parker. However, the author provides an elegant demons- 
tration of the existence of these squares. 
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Let О be the (2n-+1)xn-matrix whose columns are the last n rows of А reckoned from the 
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bottom upwards : 


to 
в OF в 
з 
+ 
- 


SD WBA d гру А, 
0 1, a п—1 ( 
Let D be the nx(2n--1) matrix formed by the row of A beginning with 2, 4, 6, ..., 2n } 
respectively: 
А ИА 0 1 
4 5 6 me 1 2 
2n 0 1 a 2n—3 2n—1 24-1 
Finally, let Гл, Г» be two mutually orthogonal latin squares of order » formed by 
24-1, 2n--2, ..., 9n. 


Then the matrices 


вс BS Dt 
М. d D М: im D 
D L 0” 1, 
form two mutually orthogonal latin squares of order 32--1. 


The verification is left to the reader. 
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COMBINATORIAL ARRANGEMENTS ANALOGOUS ТО 
ORTHOGONAL ARRAYS: 


By €. RADHAKRISHNA RAO 
Indian Statistical Institute 


SUMMARY. Та this paper are considered some combinatorial arrangements analogous to ortho- 
gonal arrays introduced by the author several years ago (Rao, 1946a). In ал orthogonal array, all combina- 
tions of s elements taken d at a time allowing repetitions occur an equal number of times in every set оға 
rows of the array. Ву not allowing repetitions and relaxing the condition that the combinations should 
be ordered certain new arrangements called orthogonal arrays of Type I and Type П have been obtain- 
ed. Their relationships with orthogonal latin squares and their use in the construction of BIB designs 


have been discussed, 


1. INTRODUCTION 7 


Tn а recent paper the author (Rao, 1961a) considered the construction of BIB 
designs with replications 11 to 15. Some solutions depended on certain types of 
arrangements analogous to orthogonal arrays introduced earlier (Rao, 1946a, p.134), ` 
developed and applied in the construction of confounded factorial designs (Rao, 1946b, 
1947, 1948). The recent significant work of Bose, Shrikhande and Parker (1960) 
and Parker (1958) on the falsity of Euler's conjecture involved the concepts of ortho- 
gonal arrays and closely related arrangements which may be called orthogonal arrays 
of Type I. The object of the present paper is to examine certain other arrangements 
called orthogonal arrays of Type П which have been found useful in the construction 
of dicyclic solutions to BIB designs (Rao, 1961a). 


2. DIFFERENT TYPES OF ORTHOGONAL ARRAYS 


Consider а set S of s symbols and а 1x N matrix of elements of 8. Such 
a matrix is called : 

(1) an orthogonal array of strength d, constraints t and index A, and repre- 
sented by (N, #, в, d), if in every set of 4 rows, the N columns contain each of the 
s! ordered combinations of 8 elements taken d at a time allowing repetitions, А 
times; 

(ii) an orthogonal array of Type Т, strength d, constraints # and index A, 
and represented by (М, t, 8, d) : I, if in every set of d rows, the М columns contain each 
of the s!/(s—d)! ordered combinations of s elements taken d аға time without 
repetitions, A times; ; 

(iii) am orthogonal a 
and represented by (N, t, 8, d) 
each of the s!/d!(s—d)! combinations of s el 
and without repetitions, À times. 


ay of Type П, strength d, constraints ¢ and index A, 
‘II, if in every set of d rows, the N columns contain 
ements taken d at a time with order ignored 
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For example, when s = 3 orthogonal arrays of strength 2 are as follows : 


ASA AMR) ppc 
AEB. ТОРА “BUNGE ЧАД. BEC 
(9, 4, 3, 2) AUB, FOIS ООО P в 
АЖЫРОО ТАН НАН OE 
AN BSOS RABE БҮЛ 
(6, 3, 3, 2) :I ЛЫН, Co Ane CAB 
QUE AURI О 
A B C 
(3,3, 3, 2) : IT BC >, 
OF АВ 


For orthogonal arrays of streng! 
mum value of t, for given s, are 


Se AE EC 


array min N 


th 2, the minimum value of N and the maxi- 


max ¢ 
ед a eee EN, 
orthogonal 82 84-1 


di type I 8(s—1) 
„ - type II C for s odd 


8(8—1) Јола even 


Tn the present paper it is shown that arrays of Type IT and strength 2 can be cons- 
tructed with the optimum values of N and ¢ when a field with s elements, GF(s) 
exists. For other values of 8, the methods used by Parker (1958), and Bose, 
Shrikhande and Parker (1960) for the construction of orthogonal arrays may be 
employed. 

3. ARRAYS OF TYPE I AND STRENGTH 2 


It is known (Rao, 19462) that (82, t, s, 2) — the existence of @—2) mutually 
orthogonal latin Squares (m.o.l.'s). Theorem 1 contains some results establishing 
the relationship between Туре I arrays and m.o.l.'s. 


Theorem 1: (i) (4-1) m.o.l/s of size s => (8(8—1), t, з, 2) : I 


Gi) (s(s—1), t, s, 2): I => @—2) т.0128 of size s 


Gii) (82, ¢, s, 2) = (8(8—1), 1—1, s, 2): I 
Gv) (4-2) mo.l.’s of size s 


with а common directrix «= (8(8—1), t, s, 2) : I. 
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A latin square of size s is said to have a directrix if s different symbols can be 
found in s cells with the property that no two cells are in the same row or column. 
Result (iv) of Theorem 1 is important and has been used earlier in the construction of 
Quasi-factorial designs (Rao, 1956) and confounded designs for asymmetrical factorial 
experiments (Nair and Rao, 1948). In the latter paper it has been shown that 
(36, 3, 6, 2) : I exists although there are no orthogonal squares of size 6 since a latin 
square of size 6 with a directrix can be written down. Tt is interesting to note that 
the method of construction given by Bose, Shrikhande and Parker (1960) always leads 
to orthogonal squares with a common directrix. 


4. ARRAYS OF TYPE П AND STRENGTH 2 


Theorem 2 contains the main result in the construction of orthogonal arrays 
of Type II. 4 


Theorem 2: Existence of GF(s) = (s(s—1)/2, s, s, 2) : П when s is odd. 


When s is even, the minimum value of N is s(s—1) and a Type I array 
can be constructed with this value of N if GF(s) exists, which is also а Type II 
array with index 2. The only interesting case is when s is odd. 


Let æo = 0, 24, «+15 1 be the elements of GF(s). They can also be written 
as oy = 0, fn ..., Paca Вр Йаа When 8 is odd. Consider the vectors 


(б Bits Ваал) Em es (0-1/2. e (4) 


The symmetric differences of elements occurring in the. (r--1)-th and (w4-1)-th 
positions are 


бо,-ш),-Р(ө-ад: = bes (8—1)/2. =. (4.2) 


Since (а,-о,) z 0, the differences (4.2) include all the non-zero elements of GF(s) 
exactly once each. Hence by an application of a theorem due to Bose (1939) it 
follows that the totality of sets 3 
(Biot tj Pida H tjs «s Bite at%) en (£3) 
о $=1,..., (8—1)/2 
are such that іп any two positions all combinations of the s elements taken two a a 
time occur exactly once each. The sets (4.3) written as columns provide 
(s(s—1)/2, s, s, 2) : II, which proves Theorem 2. 
For example, if s = 5, the residue classes 0, 1, 2, 3, 4 (mod 5) form a field, 
Wo write the elements as 0, 1, 2, —1, —2, and hence the key sets (4.1) are 
(0, 1, 2, 3, 4) 
(0, 2, 4, 1, 3) 
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the latter being obtained from the former multiplying by 2. Writing por verti- 
cally (as shown below in blocks) and generating the other columns by addition of the 
elements of GF(5) as indicated in (4.3) we obtain the 10 columns 


0|1234[0]1234 
1|2340|9|34041 
2|3401|4|0123 
3|40121/1|9940 


4|0128|8|4012 


which is (10, 5, 5, 2) : I. This also happens to be (10, 5, 5, 3) : II. of strength 
3, but а general method of constructing arrays of Type П and strength 3 has to be 
investigated. Considering only the first three rows we obtain. (10, 3, 5, 2) : II, which 
has been used in deriving a cyclic solution to the combinatorial assignment problem 


with 36 officers (Rao, 1961b). Some other applications will be considered in a subse- 
quent communication, 
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ON THE STRUCTURE AND COMBINATORIAL. PROPERTIES 
OF CERTAIN SEMI-REGULAR GROUP DIVISIBLE DESIGNS 


By W. 8. SARAF 
College of Science, Nagpur* 

SUMMARY. In this paper, structure of semi-regular group divisible design is investigated and 
inequalities are found for the number of treatments common to two blocks. Two particular cases of 
somi-regular group divisible design have been studied in detail and efficient necessary conditions have been 
deduced for them. One theorem gives a combinatorial property of certain semi-regular group divisible 
designs. Besides these, there are some minor results. T отыра 


1. INTRODUCTION 


An incomplete design with 0 treatments, each replicated r times in b blocks, 
each of size Ё, forms a group divisible (GD) design if the v treatments can be divided 
into m groups of n treatments each such that any two treatments belonging to the same 
group occur together in A, blocks and any two treatments belonging to different groups 
occur together in A, blocks; Ay = Аз. Obviously T 

v=mn, bk=vr Lis АЛТ.) 

and r(k—1) = (n—1)A,+-n(m—1)Ag. Ren Liz) 
In this paper we shall study only semi-regular group divisible (SRGD) designs. 

For SRGD designs, (r—A,) > 0 and rk—vÀ, = 0. Hence (1.2) reduces to 


Spt (ЖОМ = nÀs: (1.8) 


It has been shown by Bose and Connor (1952) that in a SRGD 
design k is divisible by m and every block. contains c = Ет 


` treatments from each group. (1.4) 


Throughout -this paper, in numbering the treatments, we shall adopt ‘the 
convention that w-th group contains the treatments number 
(w—1)n-+1, (w—1)n+2, +s (w—1)n4-n = wn; w -1%..,т. 
Let N denote the Incidence matrix of the design. Then 


Тп Әр ++ Nab 


Ned Та Фи o7 тар 
(1.5) 


(0х6) 


Wy Tug e b — 


*Now, District Statistical Officer, Amravati. . 
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Here, the rows correspond to treatments, the columns correspond to blocks and 
ту = 1 or 0 according as the i-th treatment does or does not occur in the j-th block. 
It is easy to see that d 


b 5 
È n= È п =, ... (1.62) 
ігі ігі 

ӛ i 

E пуп = Ар ог Ag (1.6b) 
jel 


according as the i-th and u-th treatments do or do not belong to the same group. 
Hence, if N' denotes the transpose matrix of N 


RU. КАН: S S 
ВОДА ОТА 
INN S ae EU (1.72) 
ВВ cass AEN 
where A and В are nxn matrices defined by 
тА А Т] RASEN AS И АД ОЕ Ө 
Ar AE | ASA ae AL 
о АЯК ыл usus d Bd ur l.. (1.75) 
АША ЛЫ ii БИ Ag np КАВУ ЖЕЙ 


2. GENERAL BOUNDS ON THE NUMBER ОҒ TREATMENTS COMMON TO TWO BLOCKS 


Choose any t < b blocks of the design. Let the sub-matrix of N that corres- 
ponds to these ¢ blocks be denoted by N, and let 8j, be the number of treatments 
common to the j-th and w-th blocks (j, u = 1,2,...1). Then the (хі matrix 


NoNo = 8, = (зу) О) 


is the structural matrix of th» / chosen blocks. Adjoin m columns to the incidence 
matrix М so that the i-th adjoined column consists of 175 in the rows that correspond 
to the treatments of the i-th group ‘and 0’s elsewhere; i = 1, 2,..., m. If this aug- 
mented matrix be denoted by N, then 


ІН РСВ СЕН Poa 


B D B 
NN =| 20. НЫ s. (2.28) 
ВВ ш. фр 
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where D is an nxn matrix defined: by 


ri ACD ADDS а) 
ЖК ША e 
DE NONE c Dod RS ) 2. (2.25) 
| AHR At]... +1 


Using (1.1) and (1,3) we get 
[Ni Wy |= (rk4-n)n"1(r— A,)nn-5 .. (2.20) 
which is always non-vanishing for SRGD design. 


Let the columns of N, be permuted so that the first ¢ columns correspond 
to the chosen blocks; then the sub-matrix of М; that corresponds to these / blocks 
will be №. Now, adjoin t row vectors to N, such that the i-th row consists of zero 
elements everywhere except the i-th which is unity. If this resultant matrix of order 
(v--t) x(b-++m) be denoted by М, then 


(2.3) 


where J; is the identity matrix of order # and 0 is a £x (b--m—1) matrix with all null 
elements. Further 


(2.42) 


and [№№ | = (n+ vÀ) пт-Ци— №)" Ci] 202. (24) 
where О, is of order txt and its elements are j 
Cj, = (поль) ви Hke + 1—Az) + Ag ... (2.5) 


where бу == (г—Ау—Ё) or -5;, according as j = w or ў ёи. The matrix О, given 
by (2.4b) is defined as the characteristic matrix of the ¢ chosen: blocks. It is a 
symmetric matrix. If 8, is the structural matrix of the ¢ blocks then there exists a 
one-to-one correspondence between the elements of С; and 8, The relationship 
between the matrices б; and: S, can be expressed. as 


O, = (под) А) (nF vda) Sea - 1—Aa) HAE, 2 (2:6) 
where Е, is a tx matrix with: all elements unity. 
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We can now state the following. 1 
Theorem 1: If С, is the characteristic matrix of any set of t blocks chosen 
from a SRGD design, with parameters v, b, т, k, Ay, Аз, т, т, then (i) |О,| > 0 if 
1<0--т-»; (8) |6) = 0 if £o b-+m—v and (iii) oi pM е A =O i 
із à. perfect square if t= = b+m—v. 


As a simple ES of Theorem 1, we get the following inequalities 
for в. Let t= 2. Since the factor outside Я іп (2.4) is positive, 


[Cs] = [%-ЕА„)т—А,—®)-ЕЁе(А,-Е1—А„)-Е%А„]— 
[h(n vast (А-А) 48А > 0... (9.7) 


Simplifying this, we get the following: 
Corollary 1: For a SRGD design, the number of treatments common to 
any two blocks satisfies the inequalities 


ч 


0А n 


qu ЛШ =») >8,> —(r—A,—h). 
For symmetric SRGD design, і.е., the design with r үр k, 8;, satisfies 
At x, FOL 7A) > >> zx ... (2.8) 


Connor (1952) has proved that for a symmetric SRGD design sy satisfies the 
inequalities 


"s —A4- "s =n T6. ^ 2 (@.9) 


It is easy to see that (2.8) gives 12501 bounds for Sju than (2.9). When (А.—А,) 
ғ- 1, (2.8) and (2.9) are identical. 


3. SRGD оғ LINKED BLOCK TYPE 


Roy and Laha (1956) have proved that for a LB type SRGD design 
b = v—m-Fl. Consider the characteristic matrix of t= 2 blocks of, this design. 
Obviously t > b--m—v and hence by (ii) of Theorem 1, Ў sb t 


2161 = Гонда, Ба д-р 
—[—(%-Е®5,)44-ЕЁ(А,-Е1—Л,„)-Е%А„]# = 0. -- (84) 


OF мА ЕА) + BA, — 
= +[-(n+ 2А); ke(Ay-+1—A,)-+- HAs]. see (8.2) 
290 


STRUCTURE AND COMBINATORIAL PROPERTIES ОҒ SRGD DESIGNS 


The negative sign on the right hand side of (3.2) is discarded since it gives fractional 
value of s, which is essentially an integer. Prefixing positive sign to the right hand 
side of (3.2), we get 

819 = ЕА. 25: (8.3) 
This important result which determines the structure of this design completely is 
stated in the following theorem. 

Theorem 2: For a SRGD design with b — v—m--1, any two blocks have. 
(5--21-”) treatments in common. f 

From this theorem, it is obvious that when a SRGD design is dualised, the 
dual design is a BIB design. Hence the following corollary. S 

Corollary 2: Tf an LB type SRGD design with parameters v, br, Е, т, n, 
Ау, Ag is dualised, we get a BIB design with the following parameters: 

v* = b, b* = 0,7% =k, k* =r, A* = А. 
The following two corollaries are also obvious. 

Corollary 3: In an LB type SRGD design, if one block is omitted and in 
the remaining blocks only those treatments are retained that occur in the omitted 
block, then a GD design with following parameters is obtained. 

b*=b—1, v*—k, r*—r—1, Ап, m*=m, n*=0, МА 1, Ау=А—1. 

Corollary 4: Tf one block and all treatments occurring in it are omitted 
from an LB type SRGD design then а GD design with the following parameters is 
obtained. f 

b* = b—1, v* = v—k, r* =r, k* =т—А, m* =m, т = n—c M = Ay A= Àa 
From Theorem 2, we can obtain a necessary condition for the existence of SRGD 
design. with b = v—m+1. 

To the incidence matrix N, adjoin (m—1) new columns such that the i-th 
adjoined column has 1’s in the positions (i—1)n--1, (i—1)n+2, «.. (Пи and g 8 
elsewhere. If this resultant square matrix of order v is denoted by A, then using 
Theorem 2 we have : 


Ё (ЕА)... (ЕРА) ОЛОК 
(&®-„А,—) k (54-44-96 с... с 
А'А = (3.4а) 
E 44| = = тА (А) = (АА) ... (3.45) 


which must be perfect integral square. This leads to the following theorem. 
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Theorem 3: (i) With even number of treatments in an LB type SRGD 
design, As or Ay(Ay—4) must be perfect integral. square according as b is odd or even. 


(ii) With odd number of treatments in an LB type SRGD design nà, isa 
perfect square. 


As an illustration, consider the design with v = 93, b = 63, k = 31, r = 21, 
m = 31,n = 3,A,= 0, A, = 7 as the parameters. For this design nA, = 21. Hence 
according to (ii) of Theorem 3 the design is impossible. For an LB type SRGD 
design with A, = 0 A, = 1 any two blocks have exactly one treatment in common. 
Therefore, by Theorem 2, substituting A, = 0 


k—r—1 or r=k—l. |. (8.5) 
Hence, we have proved the following lemma. 


Lemma 1: For an LB type SRGD design with A, = 0, A, = Y, :r = k—1. 
Using (1.1); (1.3) and the Lemma 1, it is easy бо:зее that the parameters of an LB 
type SRGD design with A, = 0 and A, = 1 сап be expressed. as: 


v= m(m—1); b = (m—1), r = (n—1), k = m,n = m—1, m, à, = 0, А = 1. 


Majumdar (1953) has shown that a symmetric BIB design is the only design 
in which any two varieties occur together in A blocks and any two blocks have A' 
varieties (A = A’) in common. А somewhat similar property holds good for an 
LB type SRGD design. It is stated as follows: 


If in an incomplete design, v = mn treatments divided into m groups of n 
treatments each, are applied in b blocks such that any two treatments belonging to 
the same group occur together in A, blocks, any two. treatments belonging to different 
groups occur together in A, blocks (A, << 25) and any two blocks have Л (5 0) treatments 
in common, then the design is an LB type SRGD design. 


The proof is implicitly contained in Roy and Laha (1956). 


4, AFFINE RESOLVABILITY OF SRGD. DESIGN 
Іп a resolvable design, the blocks are separable into groups, each group (or 
replication) containing all varieties, each variety once and only once. Each replication 
. must necessarily contain the same number of blocks, say t, so that, 
b=tr, v=tk. ws (4.1) 
We shall now prove the following theorem, 
Theorem 4: For a resolvable SRGD design with b = v—m--r, any two 
blocks not belonging to the same replication have the same number of treatments in common. 
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Let В; denote the i-th block in the j-th replication. i= 1,2,...,¢; 
j = 0, 1, 2, ... r—1, and /;; the number of common treatments in the blocks В, and 
By4;$—1,2, 5.03 = 1,2; 3. 7—1, Then 


t ті 
È Хіу-Иг-1). (а 0) 
ісі і-і 


Considering the combinations of varieties taken two at a time from the 
block B, : 


Но lly) _ mee—1)084—1) | тт 1) 5—1) (аз) 
jel ігі 2 2 2 
= i 13, = k(c—10,4—1)--(5— тт 1) +k(r—1). RO d 


Let 2 denote the mean and 87 be the mean sum of squares for the ¢(r—1) 
quantities ly. è = 1, 2, ..., ti j = 1, 2,..., (7—1). Then 


3 31 
gi ence v TE. 27 (46) 
7—1) i 
7-1 к 
and cum > X it—. ... (4.6) 


0—1) ігі ісі 


From (4.4) € X ij = k(c—1)4—1)--ke(m— 151) -K(r— 1). 


ізі іі 
= k(r— А) (АА) + (A2—1) ev (47) 
цуе = Кена. а И) 
This simplifies to t(r—1)s? = Ë otru). 2. (49) 


But s? 2 0 and since v >k we have 
r2kLA. 2. (4.10) 


When the equality holds good, ie., when r — k--A, then s? = 0. iid 
each 1; = 2 or 0/0. Such designs may be called “affine resolvable SRGD designs.’ 


Now for affine resolvable SRGD design, 
п= НА, = А ЕСІ 
ог . b(r—A4) = vr. ... (4.12) 
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Since r—A, > 0, this simplifies to 


b— отт. А ... (4.13) 


Hence if the design is resolvable then if 0 = v—m-+r, it is affine resolvable and. con- 
versely. This proves the theorem. - Also, it is clear that for affine resolvable design, 
k/t must be an integer. 


With the help of the above theorem, a necessary condition for the existence 
of the affine resolvable SRGD design can be deduced. 


In forming the incidence matrix М for a resolvable SRGD design the blocks 
shall be arranged in such a way that the first / columns correspond to the blocks of the 
first replication; next ¢ columns to those of the second replication and so on. 


To the incidence matrix М adjoin m columns such that the j-th adjoined 
column contains l's in the positions (j—1)n+1, (j—1)n+2, (j—1)n4-3, ..., (7—1) 
n--n = jn, and 0’s elsewhere. Then, this new matrix is extended by joining r rows 
such that the i-th adjoined row will have 17 in the positions (4--1)--1, (#—1)#--2, 
(4-1)/--9,...,(4- ЦІҒ-й = it, and 07% elsewhere. 


. If this augmented square matrix of order (v-+-r):be denoted by N* then 


EIS IS ADIP AD о ТО. 1] 
ADI аа ТТС «cA ЛАОС Т, ДИЛ 1 
Atl zer eli Ago ace Migr а: А 
Ne Penile АД ИБ EET ТАҚТАЛАР 10012571 

yg UE s DOE VA EIAS Ag, Дт o |... (4.14) 

А И Т 
TU гро 0 
Таб ТОТО 
ЗУ Е иа CRM 


Using (1.1), (1.3) and rk—vA, = 0 
ШУ» = EN |* — foh а — v.m. p ... (4.15) 
віпсе = v—m-+r and tk = v. 
(4.15) must be a perfect square. 
We can now state tho. following theorem. 


Theorem 5: (i) With even number of blocks in an affine resolvable SRGD 
design (a) m must be a perfect square if m and r are odd. (b) n must be a perfect square 
` if m is odd and т is even. 
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стт гс гт зуя СУУ Ны лла ла аламы E——EM—A——— CREE ы ЫДЫ... 


t 


STRUCTURE AND COMBINATORIAL PROPERTIES OF SRGD DESIGNS 


Gi) With odd number of blocks in an affine resolvable SRGD design (a) с must be 
a perfect square if m and т are odd. (b) nk must be a perfect square if m is odd and r is even. 
(c) t must be a perfect square if m is even and т is odd. 


5. COMBINATORIAL PROPERTY FOR CERTAIN SRGD DESIGNS 


We shall conclude this paper after proving a combinatorial’ property for certain 
SRGD designs. It is stated in the following : 


Theorem 6: The necessary and sufficient condition for the existence, in a 
SRGD design, of two blocks such that the number of treatments common to any other block. 
and these two are equal, is that the two blocks contain (k-]-A,—r) common. treatments. 

The proof is almost on the same lines as given by Majumdar (1953) for в similar 
property of a BIB design. 


Let the two blocks which may be called the first and second blocks have s 
treatments in common. Of these s treatments, let there be Г; treatments from the 
j-th group. j = 1, 2, ..., m. у 


Let the i-th block have ау treatments from j-th group that occur in the first 
block but not in the second; and у, treatments from j-th group that occur in the second 
but not in the first block. 


Then, considering the combinations of treatments, taken two at a time; 


b т 
5 i vi (2—1)4- Š X ung Tig A = Eu (0+ 1 х Z 179 Yii’ 


i=3 j=1 i=3 jj" 
" ` jns 
=E -pe u AT E ‚ C=- ПИ ЫП) 
jal ЕРМЕ 
3 ЛУ 
X 5 vij Yij = z (в-0)91. © (5.2) 
4-3 ізі 
X Я (e—üyc—l.)A (5.3) 
А в—1))(с—1;)А». Ыз (55 
3 i Su I M 7 
ж, ini 


b 
x o2 ggf 2 з=) Ў e-u) e 64 
i=3 j=1 i=3 j=1 


using these we have 2 12 gi — b Ea = AE UE 


295 
12 


ЗАМКНУА : THE INDIAN JOURNAL ОҒ STATISTICS: Sers A — 


j=l 


i ў т т 
Hence, if . = Ё+Аү—т, Says У y 
E E > 1-1 


and conversely. "This proves the theorem. 


I wish to acknowledge my sincere thanks to Professor 8. 8. Shrikhande, who 
verified the proofs and suggested important corrections. 
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MISCELLANEOUS 


A FACTORIAL APPROACH TO CONSTRUCTION OF TRUE: COST OF 
LIVING INDEX AND ITS APPLICATION IN STUDIES OF 
CHANGES IN NATIONAL INCOME 


By K. 8. BANERJEE 
State Statistical Bureau, West Bengal 


SUMMARY. A new formula for the true index of cost of living has been evolved through 
a new formulation: Ineidentally, it has been shown how the price and the quantity components of 
value change in National Income can be evaluated. ` 1 $ 


1. INTRODUCTION 


А new formula has been evolved for the true index of cost of living through a factorial 
approach and this has been shown to possess all the properties of Wald’s true index (1939) 
as well as those of Frisch’s true index (1936) obtained by the double-expenditure method 
especially in so far as it lies between Staehle’s limits (1935). 


2. NEW FORMULATION 


An indifference-defined price index denotes the ratio of expenditures between two 
situations ensuring the same standard of living. Similarly, a quantity index also stands for 
the ratio of the two standards of living. As two points on an indifference surface are 
equivalent, the quantity index between such a pair will be unity. Conversely, if the 
quantity index be equal to unity, it implies that the standards of living in the two situations 
are the same. Moreover, the quantity index between two equivalent points 4 and 4; on 
I(q) will also be unity where Дф) stands for the choice indicator describing the scale of 
preference of an individual. If, therefore, a quantity 9, could be found (by any formulation), 
in situation 1 such that the quantity index ко, between ф and 4), is unity, then g, can 
be regarded as equivalent to 4. The price index, ло, found by such a joint formulation, 
will represent the true index of cost of living of the period 1 compared to the period 0. 


As an illustration of our formulation, we may consider Fisher's ideal formulae for 


price and quantity : 


-./Уөш Vu чал) 
1» Үш Vor 
1 Ya V. 
and = Uy. .- (2.2) 
п, А H 
where Ya Xvi d (j,4 = 0,1) (2.3) 


is the value of gi = (qb 9% ---» Ф%) calculated at prices pi = (pj, р, -- о) Т, ала Fy 
are Fisher’s ideal indexes for price and quantity respectively. If now F,=1, the price 
index reduces to V43/Voo- 
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The condition F, = 1 implies 
Vy Кат = Vio Vo: ... (2.4) 


a condition which was independently ГН out by Frisch in his double- expenditure 
approach. (Frisch, 1936). . 


3. .FAOTORIAL APPROACH 


Let pj and gj be the price and quantity of the i-th commodity in the situation 0, 
and let рі and gi be the corresponding figures in the situation 1. From these two sets of 
entries. we obtain four values for the i-th Нет: Vio, Vi, Vio, and Vi, where, of course, 
Vig = pi % (j, k = 0, 1). In analogy with a factorial experiment involving two factors at 
two levels each, we have the price and quantity components as follows : 


= HV И Vio—Vbo] = 4V (лї—1)(кї-+-1) КОЗА) 
Q = Vf; — Vio + Vir — Vio] = ЗИ), 2. (3.2) 


Y $ 
‘where л; = zi and кї = ^ are the price and quantity indexes respectively of the i-th 
0 


item. It is easy to see that the value change, Vi,— Vip, сап simply be written as the sum of 
the price and quantity components. That is, 


їз—Ёф = Рї-+-0%, { (8.8) 

К when summed over all the n commodities yields the following : 
Vat Vig = PEQ, .. (84) 
where ' Р -і Pi Q= à Qi and Уу bi y. -91. . ... (3.5) 
К ЛЕ now we define т and. Ё as the general price and quantity indexes respectively we 

can, by ү of identities (8.1) and (3.2), write 

wee i P = ЗУ (и-—1(к+ 1), ш. (3.6) 
Wha eee 1577 Q = Woolt+1e+1). Bay 1. (67) 


Following Stuvel (1957) we may equate ves to X Pi and X Qt, where P! and Qf 


(те; defined by (3.1) and (3.2). That is, V \ 

 Voo(t—1)(k+1) = (У.Р) - ew (3.8). 
Во. Y Уш(ш-Е1к—1) = kc о Ра Veo) 2. (3.9) 
or, alternatively, after removing the factor iV from both sides, 
(6.5) (718-1) - E vs ққ X1 к 5:9 (5.10) 
y bins E ues s ed SIEGE S, шәкәр PA x ! 
А coner d yi- tLe Hl, (3.11) 
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Ya: : 
where т; is the value index, Г, and Г, are the Laspeyres’ price and quantity indexes, i.e. 


Ly = Vio/ Voo La = Vorl Voo: From (3.8) and (3.9) we have : 
т—Ё = (Ly—Iy) ... (3,12) 


7k = Yn 
00 


... (3.13) 


from which the price and quantity indexes are obtained as 


т = MU) f lglg б 2s (14) 
00 3 

b= My D ew Epo Уп MT 
00 


On account of the symmetry in the relationship of т aad k in (3.14) and (3.15), the 
indexes 7 and k satisfy certain fundamental tests, such as the time reversal and the factor 
reversal tests and are, therefore, preferred to other known formulae (Stuvel). 


The condition of equivalence іп our formulation can be worked out by equating 
the quantity index to unity. 1f this were done, i.e., if к = 1, it would follow from (3.11) 
that the quantity component in the value change is zero, i.e., 


Vy Ро Го Рю = 0 f ... (3.16) 
and also from (3.13), ІЗ T „. (3.17) 
00 


Relation (3.16) provides the necessary condition for determining the point q, such 
that it is equivalent to фо. This condition has a striking similarity wish that of Frisch, In 
the latter case, however, the condition may be written in terms of logarithms, i.e., 


log Viy—logVyo+-log Ро log Von = 0 ... (8.18) 


which jis, in fact, similar to (3.16). i 

Тһе condition of equivalence could also be worked out from formulae other than 
Stuvel’s. In fact such coaditions have already been derived in the case of Fisher's ideal 
index which is rather à special case of the generalised indexes of Stuvel (Banerjee). 
In particular cases of the generalised index, however, the conditions of equivalenee take 


complicated expressions. 


4.. Тнв NEW INDEX COMPARED WITH THOSE OF WALD AND FRISCH 
` We next compare our index obtained in (3.17) with Frisch’s true index as also with 
Wald’s true index. Suppose the Engel curves C, and C, are linear and are given by the 
following equations. ) 
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The equations of 


Oo : d$ = HV +h 
€, : qi = 7+6 


(i — 12, ..., n). ... (41) 
4 т H 
Let b af ph = ay and È ff ph = bg # = 0,1). ... (4.2) 
1 
Then 
ag =1, by =0 (j=0,1). 

pir ; 
Also E 5 % = Vos Vor = toV ntb, Vio = ao Горо. 

i= 


D 


Tn terms of the above coefficients, Frisch's'index is given by 


po Dot Уо dato VG0+44 10001 V o0 ... (4.3) 


о Voo 


and, Wald’s true index is given by 


b. — 

Бо 2 Vana 

W= КЕ ЕЕ Е. is LOB GAY! ... (44) 
а 9  l44 аа 


^while our index, obtained through the condition (3.16) of equivalence, is obtained as 


в 1+0 1 (0), wes (45) 


Оо Тю 1--ау 


Following the same line of argument, as adopted by Wald, it сап be shown that our index 
В also lies between the limits of Staehle (see Wald). 
That is, 


(4.6) 


where Рр = Ро, Or, in terms of the coefficients of the Engel curves, the new index lies 
‘between, 


and ant №. хил) 
00 


When, however, 444 = ар, we see W = В. If, in addition, b = Буу, we see F У = B = 1. 


Although formulae (4.3)—(4.5) are exact in terms of the coefficients of the Engel 
curves, yet in practice, the indexes have to be computed from estimates of the coefficients 
of the Engel curves which are subject to errors of estimation. The error of these indexes 
can, of course, be determined taking p, and Ф аз given, since the standard errors of the 
regression coefficients are known. 
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Even if the point 4) is such that Q, i.e., the quantity component in the value change, 
is not exactly equal to zero, the price formula given in (3.14) may still be considered as a good . 
approximation to the true index, provided the quantity component in the value change 
did not differ from zero significantly. Tests of significance similar to those applied to factorial 
experiments may perhaps be applied in our case also, though with caution, in view of.non-.- 
experimental nature of economic data. 

In the next section we consider a practical situation for which our methods could 
be applied with some advantages in explaining value changes in terms of ‘price and quantity 
variations. 


5. PRICE AND QUANTITY COMPONENTS JN VALUE CHANGE OF NATIONAL INCOME 


Estimates of national income are available for Indian Union in a series of papers 
entitled, “Estimates of National Income", published by the Central Statistical Organisation. 
These papers show, besides other particulars, a statement on the movement of net, national 
output at factor cost from year to year. Comparison of the national output is made possible, 
as usual, through the indexes of national output provided at current prices (indexes of value) 
and constant prices (indexes of output or quantity). These indexes furnish, of course, a 
basis of comparison, but a complete picture of the movement of national income will not 
be available unless it is known how much of the change in value is due to change in price 
and how much to change in output (quantity). ; 


We shall now apply formulae (3.4)—(3.7) to Indian national income data to explain , 
changes in the national income in terms of changes in prices and also changes in the volume 
of national output. 

In “Estimates of National Income", the values of Lọ (quantity index) and Vjj/Voo 
(value index) are provided, but not the values of Lp (price index). For L, (Laspeyres’ 
price index), we may take the wholesale price indexes* for India perhaps as close substitutes. 
These price indexes (base 1939 = 100) are not, of course, identical with the price indexes 
which would have been obtained from the statistics utilised in the estimation of national 
income. However, the order of disagreement between the two, is probably not so large 
as to vitiate the study to an unacceptable extent, or to make the computations absolutely 
hypothetical. | 

The computed price components and quantity components in value change are 
given in the following Table for the period 1949-50 to 1957-58. In the Table; ше values 
of Г, (Col. 3) and V41/Vo9 (Col. 4) have been taken from the papers on national due 
while, the values of Lp (Со1.2),88 stated before, are the wholesale price m obtained 
as simple averages of the monthly indexes for 12 months of the corresponding fiscal years; 
7 and Ё (Cols. 6 and 7 respectively) are the adjusted price and quantity indexes. Column 
5 shows the percentage change of value of net national output as given by ` 


100 ( p). < 60) 


``  *'Vide reports issued. by the Office of the Economic Adviser, -Government of India. 
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TABLE 1. PRICE AND QUANTITY COMPONENTS 


years Ly Та 100V;;]Voo 100(Vii/Voo-l) т к P 9 
(1) (2) (3) (4) _ (5) (6) (7) (8) (9) 
1948-49 100.0 100.0 р 100.0 0.0 100.0 100.0 0.0 0.0 
-50 103.0 102.0 104.2 4.2 102.6 101.6 2.6 1.6 
=51 109.6 102.3 110.2 10.2 108.7 101.4 8.7 1.5 
-52 116.3 105.2 115.3 15.3 113.1 102.0 13.2 2.1 
-03 101.8 109.4 113.5 nó 102.8 110.4 3.0 10.5 
-54 106.3 116.0 121.2 21.2 105.4 115.1 5.8 15.4 
-55 101.0 118.8 111.1 11.1 96.4 114.2 -3.4 14.5 
-56 96.4 121.2 115.4 15.4 95.7 120.5 -4.7 20.1 
-57 107.1 127.2 130.8 30.8 104.8 124.9 5.3 25.5 
-58 110.3 125.2 131.3 31.3 107.4 122.3 8.2 23.1 


The percentage difference as shown in Col. 5 is obviously the sum of the components 
in Cols. 8 and 9. 


Table 1 shows that during the years, 1954-55 and 1955-56, the price component, 
that is, the contribution made by price to value change, has been negative, as the corres- 
ponding price index (adjusted) is less than 100. Asa result, a part of the contribution made 
by quantity (output) to the increase in value of national income during those years has been 
masked by the depressing price effect. 


6. GENERALISATION OF TWO-FACTOR INDEX NUMBERS 


Index numbers for price (p) and quantity (4) involving two factors may be regarded. 
as two-dimensional, as only two factors, price and quantity, are involved, e.g., Laspeyres’, 
Paasche’s or Fisher’s index. Multidimensional index numbers involving more than two 
factors could also be constructed by a direct generalisation of (3. 14) and (3.15), For example, 


if p, q, r be the factors observed each at two levels 0 and 1, the three dimensional index for 
p may be written as 


т = lola — Larg) + V (gig — тв) 2-Е рд, -- (6.1) 
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where Lyr = Viool Уш» шр» = Vor! ооо 
До (6.2) 
Ror = Уш! Уи» 
$ . т 4 
the V’s being defined by — Ўш = 2 pigri, (j,k, 0 — 0, 1). , (6.8) 
4 


Similarly, three dimensional indexes for 4 and r can be defined, А mention may 
perhaps be made of the work of Wisniewski (1931), Siegel (1945) and Gini (1937) who extended 
Fisher’s two-factor formula to three or more factors. Siegel suggested an interesting 
application of the three-dimensional index, viz., the treatment of changes in wages as a result 
of changes in labour cost per unit of output, man-hour productivity and man-hours worked. 
Wisniewski gave indexes of price, area harvested, yield rate and crop value, where crop 
value was taken as the product of price (per bushel), area harvested (in acres) and yield rate 
(bushels per acre). Gini considered an example in anthropometry, ie. in an analysis of 
variations in the volume of the chest in terms of the variations in antero posterior diameter, 
transverse diameter of the thorax and sternal height. The generalised formula which. they 
used appear to be much more complicated in form than those suggested in (6.1) based on 
analogy of 2”-factorial experiments. 4 


In the case of three or more factors, however, we cannot generalise the result of (3.4); 
for instance, for three factors 


Viu Voo = Р-+-9--Е--РФЕ ... (6.4) 


i.e., the value change is not equal to sum of the main components unless it is known @ priori 
that the second order interaction (PQR) is negligible. Similar difficulties arise in the case of 
four or more factors. Those difficulties could perhaps be got over and exactly similar formulae 
worked out by assuming that higher order interactions (that is, interactions higher than 
three factor interactions) are in practice not very important and could therefore, be ignored. 
Moreover, in economie applications such high order interactions do not appear to have any 
obvious physical interpretation. 


Assistance of N. S. Iyenger is gratefully acknowledged. His suggestions have been 
very helpful. 
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GENERATION OF RANDOM PERMUTATIONS OF GIVEN NUMBER 
OF ELEMENTS USING RANDOM SAMPLING NUMBERS 


By C- RADHAKRISHNA RAO 
Indian Statistical Institute 


SUMMARY. A general method is given for generating random permutations of integers 
using a table of random sampling numbers and without wasting the random numbers read. This is 
more convenient in practice, specially when random permutations of large numbers of elements are 
needed. It is suggested that even for permutations of small numbers, the method offers greater 
scope than consulting a table of a limited number of random permutations. 


INTRODUCTION 


Fisher and Yates (1957) in the introduction to their tables give several methods of 
obtaining random permutations of integers 1 to m. Some of these methods require random 
permutations of subsets of integers to build up the whole permutation. A direct method 
for obtaining a random permutation of integers 1 to т, (or 0 to n—1) is to choose k columns 
inatable of random digits, where 10* > (n—1), and write down the numbers 0 to (n—1) in the 
order they occur, omitting all those above (n—1) and ignoring repetitions. Thus for obtaining 
a random permutation of numbers 0 to 24 we would take two columns; and the process would 
naturally reject on an average 75 per cent of the random numbers read. Further, if т is large 
there is the difficulty of remembering the integers that have not occurred up to any stage 
while reading the random numbers. н 


In this note, I give a general method of generating random permutations of integers 
1 to n for any n, which does not waste any random number read, and which can be conve- 


niently used even for large 7. 


ONE-WAY OLASSIFICATION METHOD 


Choose a column in a table of random digits and read the digits from any starting point. 
Take the classes defined by the digits 0 to 9 and write 1 in the class of the first random number, 
9 in that of the second number, 3 in that of the third number and so on up to n. For instance, 
if n = 7 and the random numbers read are 5, 3, 9, 1, 2, 3, 1 the integers 1 to 7 are recorded 


as shown in the first block of Table 1. 


TABLE 1. ONE-WAY CLASSIFICATION METHOD 


classes defined by digits 
2 3 4 5 6 7 
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To obtain a random permutation of integers 1 to 7, first write down the groups of integers іп 
the different classes one after the other starting from the ‘0’ class, which in the present example 
leads to $ 

(4,7), 5, (2, 6), 1, 3. 


Then randomly permute the subsets within the brackets and remove the brackets. The 
problem of obtaining random permutations for large numbers is thus reduced to that of ob- 
taining for subsets with expected number in each subset being equal to 1/10 of the original 
‘number. The same process may be used for obtaining permutations of integers within 
subsets as shown in the last two blocks of Table 1, unless there is a ready way of obtaining 
these permutations such as referring to tables of random permutations. То permute the 
subsets continue to read the random numbers. Let the next digits be 2, 4, 9, 1, 8, 0.... 
Consider the subset (4, 7); 4 goes in class 2 and 7 in class 4, which completes the permutation. 
Similarly a permutation of (2, 6) is derived in the last block of Table 1, and the complete 
permutation is obtained as я 
4, 7, 5, 6, 2, 1, 3. 


If permutations for large values of n are needed, several classes will have multiple entries, 
needing one or more repetitions of the process for each subset, This can be avoided to a large 
extent by following the two-way classification method and, if convenient, by adopting the 
built-in procedure of simultaneously permuting within subsets while recording the integers, 
as detailed in the next section. 


TWO-WAY CLASSIFICATION METHOD | 
| Choose two columns in a table of random digits. Each random number would now 
consist of two digits which define a cell in a 10 x 10 two-way table as shown in Table 2, 
TABLE 2. TWO-WAY CLASSIFICATION METHOD 


second digit 
3. А, бие? 8 0 


вен нь 


c9 д, 


Let a random permutation of numbers 1 to 18 be needed. Write 1 in the cell of the first random 
number, 2 in the cell of the second number, ... and so on up to 18. For instance, correspond- 
ing to the first five random. numbers 31, 17, 81, 45, 31 the entries 1 to 5 would be as shown 
in Table 2, Proceeding further in the same manner, the -first-18 random numbers are used 
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to post all the integers 1 to 18 in the cells. In two cells there are two entries, which could: 
have been avoided by omitting a number if its cell is already filled. Otherwise the integers 
within subsets (1, 5) and (4, 16) have to be randomly permuted first. Suppose (5, 1) and (4,16) 
are the permutations arrived at by some process. The entire permutation could be read 
out in any convenient manner, say from left to right row by row, the order of integers in cells 
with multiple entries being determined by independent permutations. In the present example 
the permutation is 


9, 10, 2, 6, 12, 5, 1, 7, 11, 4, 16, 17, 8, 15, 13, 18, 3, 14. 


The problem of multiple entries, due to repetition of random numbers, is not serious 
since the chance of having more than two or three entries in a cell is small if 18 not very 
large, and it is easy to obtain a random permutation of two or three integers. 

It is also possible for the random permutation of multiple entries to be built into the 
process of entering the numbers in the cells as follows. For instance, when 5 has to be placed 
in the cell (31), which already has 1, read the digit inthesamerow containing 31 in anadjacent 
column of the random number table. If this digit is even, write 5 (the second number to 
be placed in the cell ) before 1 (the first number), and if odd, write 5 after 1. Random per- 
mutation of 1, 5 is already secured by this operation, and similarly the order of integers in 
(4, 16) could have been settled while posting 16 and the whole random permutation is 
obtained straight away from the two-way table. Ifa random number repeats forathird time, 
first read the digit in the same row in an adjacent column, If this is 0, pass on to the next 
column, апа во on, till a non-zero digit is obtained from the row. The number to be written 
in the cell is put in the first position if a non-zero digit read from an adjacent column is 
= 0 (mod 3), in the second position if = 1 (mod 3), and in the third position if = 2 (mod 3). 
A similar device could be adopted in case a random number repeats for a fourth time, and so 
on. As stated earlier, the permutation of multiple entries in the cells may be undertaken 
at the final stage, if that is more convenient. ^ 3 

For larger values of n, to avoid multiple entries, one could use three columns of a 
random number table and consider a triple coordinate system to identify 1000 cells of a 
three-way table. The best way of representing would be a triple classification table involy- 
ing 10 two-way sub-tables, each sub-table corresponding to a given value of the third digit, 
The procedure outlined before can then be followed. 

It is important to note that for any value of л, one can use any arbitrary number k 
of columns of the random digits and proceed to distribute the integers in the cells of a k-way 
table. But it may be convenient to use one-way classification method for n < 10, two-way 
classification method for n < 100 and so on. 

In this connection, the reader is referred to two papers by Lahiri (1951) and Matthai 
(1953) on choosing, from a list of sampling units, with unequal and equal probabilities 
respectively, a random sample with the help of random sampling numbers. 
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CORRIGENDA 


' Tables of Random Normal Doctor By J. M. Sengupta and N. Bhattacharya, 
Sankhga, 20, 249-286. 
(1) Plate 2, col. 8, line 28 : should read + 2.260 and not + 2.268. 
(2) Plate 2, col. 8, line 29: should read + 1.195 and not + 1.190. 
(3) Plate 2, col. 8, line 30: should read -+ 1.298 and not 1.295. 


On Some Classification Problems-1: Ву 8. John, Sankhya, 22, 301-308. 


Expression (2.11): First factor is f (и), и). 


Equation (3.8): .The numerator of the last factor of the summand should be 


А р-і 25--в--2 
("г ү) 
On Some Classification Statistics: Ву S. John, Sankhya, 22, 309-316. 


"Equation (8.2): Reverse the inequality sign. 
Equation (5.3): | Change 68 to 205. 
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Ptr вур т. раат" а. іс 


- ж J Y 
University: ve Aberdeth, and Agricultural Research Oouncit Unit 397 Statistics 


Е 
SUMMARY. “Бога eni у frequeney distribution, гай by its ae E. 

Бо һауе been obtained pertaining to the effect on the distribution of the first variate produeed by selection | 

Е. in respect of the second variate. The most general: rule of selection studied is that which „dofine | the рго-, 
Ee bability of selection авза function of thosocond variate; ; ånd the moment generating function. or the fus 
variate after selection is stated as ө equation: (б). This concise formula: appears to be o of little е direct cue ri 
for caleulation, though an int ting éxpansi än series is. БШШ when. "e. DU distribution ік 4 
bivariate normal, equations (4. 10), (44б)уала 4,9). ah oum РА „2: ro 

The 1 main task of the latiér] part. of the paper hag | been to obtain хоно expressions, "S the first 

four moments of the first variate after’selection, in the case of prime importance, namely *'eut-off selec: 
tion” in which all values above a бөгібіп limit'are taken and all.others d discarded. "The moments have 
been given in two forms, in terms of the. cut-off paint and in terms of the Pa gelected апа 4 
corresponding normal deviate. „+ 


` 1хинойтотох C 
y 


Іп an earlier paper У, 1956), Ihave examined the ieu of the frequency ` 
distribution obtained when a variate that’ can be measured only- with the inclusion | 
of a normally distributed error is subjected to а “cut-off” selection based upon v measured . 
values. That is to вау, each individual measured is rejected unless its measurement 
exceeds an arbitrary quantity, and the interest then: lies in the frequency distribution 
of true values among those cted. Тһе arbitrary cut-off point was there defined | 
in terms of the proportion of. "the whole population to be selected, and expansions AA 
in series for tho first four moments of the true values selected were obtained. ‘This is | 
à situation that arises in many practical circumstances. For example, in the selection 
of crop-or animal strains or varieties | (Cochran, 1951; Finney, 1958; Robertson, 1957), _ 
E * 0 variate whose values are to. be ‘increased by selection will be the yielding capacity | - 
; of a genotype; but decisions must. be made on the basis of dedico ptorum ў, 
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Again, in the selection of children for secondary education, innate ability may ‘be the 
character of real interest, but the evidence on which decisions are to be taken will 
consist of scores in examinations and tests of intelligence and skill. The assumption 
that the measured value deviates from the true by a normally distributed error is 
likely to be adequate for the theoretical study of such problems. 

= = Results of greater generality may в sometimes be useful. With this in view, 
not only has the restriction to a normal error distribution been removed but also a 
Ў less rigid process of selection than a simple cut-off has been studied. Certain formal 
results of greater generality than before have been derived. т 
uw, JA ЖЕЗ, SPECIFICATION OF INITIAL: DISTRIBUTIONS: 


"For the concise expression of general results, the notation of the previous 
per needs to be substantially modified. Suppose 2, y are two'variates with probabi- 
lity density function p(x, y) for which all cumulants, к, exist. Then, with the con- 
»vention: that all integrals are over i: interval (—00, oo) unless the contrary із 
ио", 3 


“ff exp (0-60) pis, y) dz dy = 35,0, 9 


десе =o (22 fu, 0) 2. (21) 


` defines the relation between the p.d.f., the moment generating function, and the cumu- 


(Throughout the paper, 0 and ¢ are used in preference to 20 and %ф for simpli- 
city of writing, as this makes ‘no essential change in-the mathematical development). 


` Here x isto be regarded as the variate in which an investigator is primarily interested, 


but selection is to be based on the values of y and the effect of this selection on the 
distribution of 2: then studied. ‘This situation for which this theory was developed in 


- 1956 was that of æ representing the genotypic value of some character for an individual, 


and y representing the phenotypic expression of this character; у can be measured 
directly, but individual values of x cannot be measured and the nature of the distri- 
bution of « after selection can only be inferred from that of y. 


One impórtant class of problems, to which the genetic situation just mentioned 


belongs, has 


y = stw, Xe (20) 
where w is an error of observation or measurement from which & cannot be freed. 


. The 1956 paper was restricted to this model, with w normally distributed ànd inde- 


pendent of х. The theory that follows is applicable whatever the distribution of w 
and however 2 and w may be related as long as all cumulants exist. If М, (0,9) 
is the m.g.f. of the joint distribution of x and w, 


May 0,9) = f f. exp (0--0в--041 ple, z--w)dzdw- о 
Er 7.54 (0--%, Фф). ~ m (2.8) 


*If the ranges permitted to the variates are finite, the limits of integration will Eo Epp erably’ 
modified and similar changes will affect definitions such ав equation (3.1). 


г 
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If now ку denotes the cumulants of z and w, - 


x sg rp g 
TOR ae 2 3gr 099v 0 


and equation of coefficients enables the Ку to be expressed in terms of the к). There- 
after formulae appropriate to the model indicated by (2.2) are based upon the general 
results of this paper. 4 ` 1 


Without real loss of generality, means may be taken as origins, so that 


Ki = 0, Ky = 0, ^ t eoo (2.5) 
A further convenient symbolism is кР Qe 7 1 
К» = 0", Ky == PT, Ky = өй ^t. (2.0) 


о? and w? Being the variances of z and y and p their correlation coefficient. А generali- 
zation of a result stated by Cornish and Fisher (1937) enables p(z, у) (о bo expressed 
formally in terms of the cumulants and the differential operators | 


Pow 4 


у А д ^ $ X 
Dz nes { “бу! _ tar (2.7) 

this is plz, у) = exp [— #0202 +?рсер, „Dy 2m) М, (р, =D,) 
ы i b o 2% 2рху у? 45 
таа 

4 КА. ' e à Р 
Also, the marginal p.d.f. for y alone із 1 5 Я 

Ay) = exp (— doti рма, (0, Dy) eee. .. d) 


The m.g.f. боғадытау corresponding to any specified у, May (0), сап now 
be concisely expressed by means of Bartlett’s theorem (1938а): / 


Р ке) и 
May (0): pig) = 0000) рф). 


Мг (0, —D,) 
= exp (—10*Dj) Mey (0, — әй). 2 (2.10) 
The variates may мв ил е standardized, во as to have unit standard deviations 
initially, by writing “Г pe top" idu) 
i D,= ө = oD, say, ‚се Qa 
on | | уе бо 2 es (2.13) 
In particular, of course, ` ую = Уш ae and Уш = 7 = l, Yy =p. 


Th х k^ & 
іш Ж Isr 7) (c8) (—Dy ] =l oq du. (2.14) 
May (буруйу = өх (09 exp [ E E yp CODY | 7 
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3. THE SELECTOR FUNCTION 


A general process. of selection operating on y (either for the whole population 
‘or for a random sample theroof ) may be represented by the probability that an indivi- 
dual with a partieular value of yisselected. This quantity, hereafter referred to simply 


S ав the selector, will be written аза function h (2) such that : 


Е пиа zo). _. (1) 
4 z T) gn (-о<у ) 
and, without loss of generality, 4 


ж 


ENT nea 


2222 because only the proportions їп which different values of y are selected affect results. 
ие selector is not itself a p.d.f.. “Its definition means that the p.df. of Y, which symbol 
о will be used. tod ote у in the selected portion’ of the population, is p,(¥), where 


ч a “P= qM) pyuydy ... (8.3) 


2 is the proportion of the whole population (or the expected proportion of the measured 
^ yandom sample) that is selected and . дере è 


i и " 
R ; nane pa h(—) p(X). 7 2. (3.4) 
Тһе problem under discnssion: is that of finding properties of the frequency distribution 
`. ` ef X, the value of x in the selected portion of the population. 
E The concept of a selector-includes the cut-off selection of the previous paper, 
‚Жог which р 


Ж in EO = 
E 49-1! @<1) is 
Ma е 1 (у 2) 
^. — andis defined so as to ensure that a specified proportion of the y-distribution is selected. 
Я A selector corresponding to a double cut-off may also be of interest; it is 
SCENE * : f1 « * 
SLE №) = | ated сом (3.6) 
ae » i 0  (y&m o у>) 
04 if the central portion is to be that selected, or this with all inequalities reversed if 


the two teils are to be selected. "The interpretation of general results so as to apply 
to selectors such as (3.5) and (3.6) must be undertaken with care because of disconti- 
nuities. | Ss 
; 4. THE NORMAL DISTRIBUTION 

- The simplest version of the.general problem is of intrinsic importance and also 
‘of interest as indicating the type'of result that is wanted. It is however scarcely 
typical of the complexity that will be encountered later. Suppose that р(х, y) is 8 
bivariate normal p.d.f., so that 222% t 
Kg —0 for i+j> 2. 2. (41) 
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Then the cumulant function for X сап readily be expanded as a power series in 0 : 
log Mx(0) = 400° —10р Hy-+log{E{h(u-+Bo6))} 


= 40204 $ (дыр Е.) = vs (4.10) 
1 Н 
Hence the cumulants of x for the distribution of selected individuals are 


„g — Bok, 
К = E 


0 


, 


4.11 
Ki ot XD ED E 


К; = (fo)'Ai(log Ey) for i> 2. J 
An alternative expression for И; deserves noting. If- 


І he 


Z(u) E eM Ж (4.12) 


TT 


then 02 Е; = [М(и) Z(u)du = f Ми) (—D)'Z(u)du, ... (4.13) 


this following from i successive integrations by parts, 


If now the selector (3.5) is considered, evidently 
LJ 
= [-Z(uu, ... (4.14) 
0 


where U = о. =... (4165) 
Ву the use of (4.13), the discontinuity of the selector at u — U is seen to lead to 


d 
В = Р, E, = ZU), А = dU- 
"Therefore, (4.11) can be written 
d? 


Къ = g?-- Bra? dUs 


(log P), 
(4.16) 


K; = (- f) © (log P) Eti 


This agrees exactly with equations (12) of the previous paper, although the definition 
of fis somewhat wider. Тһе model previously used was based upon equation (2.2), 
а special case of that now under discussion; for it 


Ки =o? so that p=c/o, and В = Plo. 
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Е t 
The selector (3.6), which might be used if the aim of selection were to homo- 
genize the population by rejection of extreme values rather than to increase the mean, 
gives similar formulae. If А 


U,= то, — Ug = о, ... (4.17) 
с vi „95 
then Р = | Z(wdu. ғ ... (418) 
11 


Непсе 


E, =P, E, = Z(U)— Z(U3), 


д д 
апа A= Эй, = aU, 5 
Consequently K= оз e –,9. үр log Р 
a ôU, 


- д УА 
= (bart dest segs p 
E, ^ (бу; ША Joc Я 


for 9—1, 3,4,... 


... (4.19) 


5. GENERAL THEORY 


When p(x, y) is completely general in form, the method of approach to the 
properties of the distribution of X must be changed. Pearson (1912) obtained whet 
appeared to be general results (at least for means and variances) math a се 
population, but his analysis contained an implicit assumption of linoarities of re- 
gressions. Bartlett (1938b) and Lawley (1943), while showing that this ma not faves: 
sitate normality of distributions, demonstrated that some rather special ошо 
must be satisfied. Even if the regression of y on x is linear, the methods of Section 
4 will break down unless linearity of the regression of 2 on y permits US d of X 
to be derived from moments of Y. For the model in equation (2.2), Lindley (1947) 
proved the necessary and sufficient condition that z has a linear regression on y to be 
that the cumulant generating function of a is a multiple. of that of w, and Curnow 
(1960) has further studied this very special situation. For the general problem of 
this paper, a different approach seems essential. 

Equation (2.10) gives the m.g.f. of æ for any y. Clearly Mxjy(f) and May(0) 
are identical wherever the former is defined, that is to вау for any y Шы F ы oe 
zero probability of selection. Hence AM y(0) can be obtained by integration о 2100) 


over the frequency distribution of Y : te 
Mx(0) = f Муур) 
7 ў ... (51 
— pa ue exp [E E 24 (004—004 (6.1) 
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by-(3.4) and (2.14). Moreover, from (2.9) and (3.3) or from the terms independent of 
8 їп (5.1), | ; 


P= [Aue P exp [> * (-Dy] 2(и)аи. 1. (5.2) 
Write 70,0) = exp ЎЎ 25 (gg Dy - ... (6.3) 
3 3 ilj! 


and rearrange the terms involving cumulants of the second order in (5.1), (5.2). 


Then $9202 
В J h(u) f(8, D) Z(u—po0)du. Ба 
Мт TOOD ОТИ E 


This can be put into à form more reminiscent of (4.5); integration by parts, as for 
equation (4.13), leads to 


_ ei" ATF, — D) h(u+-pod)] 
AM y(0) ЕО, —D hu) Е ... (5.5) 


When p(x, y) is bivariate normal, /(0, —D) is identically unity and, since 
pc = Во, 


(5.5) reduces to (4.5). From (5.5), by formal expansion in powers of 0, the cumulant 
function of X can be expressed in a manner similar to (4.10). 


(6. GENERAL CUT-OFF SELECTION 


z Although (5.5) is a concise expression of the most general result, (5.1) is more 
useful in the development of explicit formulae for moments. Тһо process will be 
illustrated for simple cut-off selection, again with (3.5) as the selector. The process 
is the same as in the previous (1956) paper, but results for a general p.d.f. of z, y аге 
now available and the expansions in series have been taken to higher order. 


The first step is to expand the operator in (2.14), |, 
F(D) = е "exp [zz E (6) (—Dy Rus se (6.1) 
DH 


in powers of (c0) and D. Тһе general term in the expansion is 


vee (обречен... (— рро. | 


i Т ! 
РТ... GPG (Лу. 7% e Ye 
(6.2) 


except that any term involving у, is to be omitted, in order to take account of the 
factor © #7. Cornish and Fisher (1937) have suggested that the cumulants of a dis- 
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tribution will often be in decreasing order of magnitude in respect of a parameter, 1,- 
representing the size of a sample. An appropriate supposition, in line with Cornish and 
Fisher, is that 


Vy = OWA) for 4) > 2. 22. (63) 


The expansion of F(D) has been arranged in powers of (00); the terms іп (00) have 
been taken to 0(n-2), and those in (о0)2, (00), and (c0)* to 0(n-3?). This expansion 
is not written out here, but it is of a pattern similar to that of equation (20) in the 
previous paper. sias 


Suppose that 7 has been so chosen as to cut off a proportion Р of the popu- 
lation; write 4 ( MEE 


Uh | Е бду 


and regard Р аз а function of U. Then (3.3) becomes 
PU) = J D,(yMy. е (6.5) 


Define 7 as the unit normal deviate corresponding to a specified probability РТ); 10. 


P(T) = i Z(u)du, ‹ .. (6.6) 


with Z(u) as defined in (4.12). А result of Cornish and Fisher (1937), extended by 
Fisher and Cornish (1960), now enables the value of U corresponding to a particular 
T to be written as an infinite series; if - 


P(U) = Py(T), т (6.7) 


then a consequence: of (2.9) is that, to 0(n-?), 


U = T: WT 1)4- в) 55 YET 5T) 


1 
120 


+ Yos T4 —6T*4- 3-3 YoxYos(T4—5T? +2) 

+ E- YR 127—537] -1)4- 2 grim +157) 

— по Yers 1779810) a TROIS 240+ 207) 

m ViVo (4781097 4-:1077)— a y est ner 15117): ... (68) 
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Expansion of Z(U) then leads to 


AU) = т) 1 L аа) 1 Уы(Т*—3Т°)-- 2. уЫ(Те--Т#—т1#—1) 
6 24 12 
(559 


=ч all eran, jag Two" T5 23-9). 
= iss ныт ат-ат iur рату 1 7%(Т9-1074-4-157%) 
+ m YuYa(T* —52T4-L 127243) 4 so y&,(T8-L-27 — 57 74-- 7872) 
EC Yüsy o (T19-1- 88 — AT8 —434741-513772-1-24) 
is H iror Ретона К o» 


the earlier terms of which appeared as equation (24) in the 1956 paper. 


For a specified cut-off proportion, P, equation (5.1) may be written 


My(0) = PA [ro Z(u)du. 2. (0.10) 


Since Г D'Z(u)du = —D' Z(u) at u = U, 157 (5.11) 


terms on the right hand side of (6.10) can be expressed in Hermite polynomials with 
argument U, If the selection is specified by 7, the point of cut-off, the moments 
of X are now available. These were not presented in the previous paper, but selection 
by a pre-determined cut-off level may often be employed and the first four moments 
axe, therefore, given below in equations (6.13) to 0(n-?) and (6.15), (6.17), (6.19) 
to 0(n-97). Each of these involves P(U), which can-be determined from the terms 
independent of 0 in (6.10) in the form of Py(U) and an adjustment consisting of an 
infinite series of Hermite polynomials multiplied by Z(U); this appears as equation 
(6.12). An alternative form of results is obtained by substitution from-(6.8) and 
(6.9), so expressing the moments with argument 7 instead of U. Each moment is 
then written as a series involving the cumulants of p(x, y), or more conveniently the 
Ур and T, the unit normal deviate corresponding to the proportion selected, P. 
These results, generalizations of equations (26), (28), (29), (30) in the previous paper, 
JBDDdAE Pe (6-14), (6.16), (6.18), and (6.20) below. In all these formulae, thé grouping 
of terms retains the pattern of the Cornish-Fisher expansion. a 
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From the terms independent of @ in (6. 10), 


РОО) = РА) (p va D*— 3 yu DI A BaD A ya DA 


H 1 A " 
ag отв) 20) atu = 0. 227751019) 


The terms іп (00) have been evaluated to order 1°, as accurate values for the mean 
are likely to be particularly important. The results are 


M(X) = Ра [р ris D--gyiD*)-- (руат (буар) 


— 33g бн pos D say CoD E 3ysgyo D? -PYYososD") 


Voas (9Y0 3D" +P Vs") + ar 6y,5D*-- py D°) 


ар у (бУоУ 2% —À “Еру. ауыз E (8041308 ру" ) 


-Tm 


tu 38 xo (бУозУззУоа1#-_ 4g; D* +руйвум") 


+ тор Ove Doy D] 24) at и = U e E 


and 


аА(Ху- м 


| p+ E. Зуз- 2p yos) Tx (4у3— ЗУ (T* — 1) 


—gg був M25) т") 


ШЕ ES (Ey14— 49/05 (7* —37)— 54 gi ura F3ynlo— Sposa) 13—21) 


зи (9а 8ру(192Т3--177)-- gs (Yu ev 67?--3) 


- alg бууы Br Призы 2T 9748) 


чыт (SY uaa WPVM IE 127?+5) 


E gg Voal Via PT 47T?+-13) 


1 зувола lpr) ao 6807+151). г — (614) 
77 7776 
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Similarly, from the coefficient of (c6)?, 


ПХЭ) = ot p Pot [p — {6Y + (Yoo + бру) руа Dt] 
+ э буы D-E (Yot 80V) DYD} 
zh ES (12у-- аре Geni E ен 
= iam (ut 10py,,)D*-- p*ys 06) 


J 
ER 12 05y22- 24719713-+ вул) 
+(Yos¥oat8PY03¥13 + 8p" 1304) 6 +о?УозУо Р) ; 


1 
— тә (18875 +547 097/%0)D°+ (y8s-+ 180785720) D* 


+WD) | Z(u) at w= 0 


and, with the aid of (6.12), 


BR) = o TD [PT4 Gya--3oy(T*—1)— phys (T1 1) 


1 З 
is 150 2T F4pyss( T3 — 37) —9p*y (T8 —277) 


1 
E LUE 127 3YaT --9yf;(T3— 37) — 360 sy, (T9— 27T)--2p*y&(12/T9— 17Т)) 


1 
+ go Cal 1)--Spy, (T — 679-13) — 29 (27497243) 


1 
с 75 = 12уозу»(2Т®—1)-- Ту yy, (T1 —6T24-3)— 9ya yo (T?—1) 


—16pyYo3713(274— 9T8-1.3) — ӘрузУм(874--1272--5) 
+39? Yo5¥o4(1474— 477%--13)) 


1 
ЛЕ 7824 {буит 1)--54уыУ (275-972. 3) 
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Again, from the coefficient of (00), 
АХ) = ey ЕР o| 3p-- DI (O71 + 1807n) DHP + 9p :3)D*-- pri D) 
: ы” | 


: | } 
БЕ. 54 (24, -- (12434-36053) D - (3s, 127713) --р3уы 09) 


1 
E 12 (108,55, D? -- (18,32 - 36003549 y12)D* 
+(3pYG3 - 18079313) D5 2- p*yts 09 
1 
—130 (605,,D -- (15744-60055) D3--(3p»/,5--15p*y14) D* - pos 09 


1 
CER (2413-108 1222-72 3113) D* 


+(012уоузз4-9Уз2Уоа 1 36рУозУгз 72P 712719 18072170: )р° 
+(8рУозУог-120°УозУ1з- 9p*y494) D 4-рэуовУоа 0°) 
1 
— ров (Vout 162y{2)D® 
+(27y857 12+ 49s - 162] oy 12) D* 
+(8убз-270°удузз) D* ў 
+] Zu) at u = U ... (6.17) 
and 


пэ) = oyoo AED) spei (80а ph руа? 


--3p*y, (78—37) 2p ys (T9 — 21) 
+ yuQ руа 1) — Spy (T? 1)--4ph (TOP 8) 
—phyy (T1 1279-5) 
+5 {187 9¥a1(Z?—1)—6(YosY12F 2pYosYar)(2T2—1)+ Эру“ 67" + 8) 
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don P+ Sorat Agra 87) aya (937) 
-p5p*y,(75— 107? 4- 157)— 4p*ys(T5—71? + 8T) 
ds at Вузу: Т 6(ЗугУзз-Е2УзтУзз)(Т®— 3T) 


3 (4yo V1 + 3V1 F Hopes S0 (T 2T) 
-- 12py, yy (T5 —1079--15T)-- 18pygyo (T? — 27) 
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E999 yy (1275— 699-647) 
ag “С 324y oaa 3 — 27) - 27 (T5— 10T* 4-157) 
E18 (yl рут (12T*— 117) —20pygyts(T5— 8T*-- 9T). 
= 16ру%(1275-- 177) 4-36р?у5;у12(167°—1017°%-- 90T) 
—ap*yi (S0T'5— 409^ + 3047 ]. 24 (6.18) 
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` Finally, from (c0), : 
HX!) = o*(3--) 
| Poot өара». (буи 4руь (309,1 36p*y,)D* 
-F(69*yos-- 12*y4,)D*- p*y,, 09) 
+ (00 967 )D+ вру 726% D-H (ts ауырыуы 


1 Р 
+79 (044,5) 421673) D--(12ys,5 - 1081,---48ру45/у%--4392ру,ыун) D* 


+(72руозуца--7 2р?уозУ», -108р?уз,)Р5 
(6020 --24р?у,зузз)0"--р^ув, D9) 


1 > 
777120 (120у,4-1-(120у,4- 240ру,) 24-(60ру,4-- 120p*y,,) D^ 
-F(6p*y55-1- 20p*y,4)D®-+ phy. 8} 
1 
777144 {(288y,2731-+ 432) 0100+ 9639719) D? 


(72703 Е 144y,357- 307510, + 9603731 + 432p 719700 
12880701713 + 2407/54) D4 i 
+(48pY93713-+ Тарауы Ун: 367497 0 + 144977197 13-+36p* 79104) 
+(6p°Yox7% 04+ 1609 ¥99713-+ 12% y,¥94) D8 НР“ D} 
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А : 
+ 30 {30741+30(Y25-+ 2753 D? —1)-- 15(py,4 4-2p*yss (T4 — 6724-3). 


— 6p*y s (2T —979.1:3)--5p5y (T9 15775-14572 18) 
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Moments about zero alone have been given, as apparently no appreciable reductions 
of tormulae oceur when the higher moments are expressed in deviations from /1;(X). 
Equations (6.14), (6.16), (6.18) and (6.20) reduce to (26), (28), (29), and (30) of Finney _ 
(1956) (except that additional terms are now available), by substitution of the para- 
meters particularly appropriate to the earlier model: i 
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7. TABLES 


In the use of the results of this paper, numerous functions of the unit normal 
deviate, T, are required. Possibly the most commonly needed are those involved in 
equations (4.16) and (6.14). The following functions : 


i 
T АТ), Tae log РАТ) for i— 1,2,3,4, 5, 


Т/б, (T2—1)24,—(272-—1)836, (T9—37)/120, —(T3—27)/24, (12? — 177/324 
were tabulated іп Finney (1956) for Р,(Т) — 0.001, 0.005, 0.01, 0.0125, 0.02, 0.025 
(0.025) 0.100(0.05) 0.95. This selection of values was chosen as containing the most 
useful for preliminary studies of 2-stage selection. і , 


Since then, Mr. S. Lipton (University of New South Wales) has prepared 
extended tables of the same functions, so far for the range Р,(Т) = 0.00001 (0.00001) 
0.00250, 0.0001 (0.0001) 0.0250, 0.001(0.001) 0.500. These have not yet been published. 


I am indebted to Mr. A. D. Henderson for checking part of the algebra. 
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ANALYSIS OF ERRORS ІМ CENSUSES AND SURVEYS WITH 
SPECIAL REFERENCE TO EXPERIENCE IN INDIA* 


By Р. С. MAHALANOBIS ала D. В. LAHIRI 
Indian Statistical Institute 
I. INTRODUCTION 


1. The demand for statistical information is growing increasingly and 
rapidly, and survey organisations are hard pressed to supply results with the required 
speed. They must assess the quality of the data and caution the users that the 
results are not perfect, but they must avoid creating an undue impression that the 
data are so unreliable as to be of no use. The manner in which the existence of 
errors can be presented would depend on the level of statistical maturity of the 
country, not only of a select group of survey experts but also of the direct users, the 
policy makers, and to some extent the general public who may be affected by any 
policy decision based (at least partially) on the statistical information. There, lhow- 
ever, is general agreement among survey experts that survey reports should supply 
some idea of the reliability of the results. 

2. The study of errors in surveys has two purposes. First, to guide the 
user in interpreting the results. Secondly, in improving the quality of future surveys. 
Even when great care has been taken to set up important controls it 18 still necessary 
to get an assurance that the controls were effective and results with desired accuracy! 
have been obtained. 

3. In India a number of techniques are being used for evaluation of survey 
results. All depend on comparisons among alternative (independent) estimates. 
Some of these may on a priori grounds be assumed to be more reliable than the others. 
4. An illustration of this is provided in the evaluation of the results supplied 

) by sample survey of the outturn of 


by (a) (so-called) complete: enumeration and (b 


jute in Bengal against (с) statistics of jute trade obtained subsequently which are 


known to be very reliable. This is à rather rare example where opportunity had 
oceurred of comparing three sets of estimates for two consecutive years. 


5. Insome situations there are no 0 priori grounds for preferring 
to another. In this case the divergence between two or more independent estimates 
f the margin of uncertainty. This is 


may provide a basis for & mental appraisal o 


one estimate 


the 32nd Session of the International Statistical Institute is 


ж This paper originally presented at 
Editor, Bulletin of the International Stastistical 


being reprinted here with the kind permission of the 


Institute. 
асситасу can be often thought of or specified as the margin 


1 For practical purposes the desi 
e taken on the basis of sample estimates would 


of ‘permissible error’ in the sense that any 
remain the same within the limits of the permissible error. 
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made possible if the results are based on interpenetrating net-work of (sub) samples 
(IPNS) each of which is surveyed and/or processed by different but comparable 
operational units. When two (or more) samples are drawn from the same population 
and covered according to the same survey design, the results based on the different 
samples are equally valid, even though they are derived by different operational 
units; and divergences between the different sets of estimates supply directly some 
idea of the margin of uncertainty. It will be noticed that the operational differentials 
find a reflection in this manner of assessing the margin of uncertainty. 


6. The interpenetration at the operational level may be of various types. 
For example, the field work is conducted by the same agency but the processing is 
done by different agencies. Or, the field work is by different agencies but processing 
by the same agency; for example, in the survey of landholdings in India (reported 
їп this paper) the entire processing work was done in the Indian Statistical Institute, 
even, though the field work was conducted by both the Central and State agencies. 
Again, there may be complete interpenetration where both the field and processing 
work are done by completely different agencies; this is the form which the Indian 
National Sample Survey (NSS) is gradually assuming with the participation of the 
different States of India. Even for the same agency our normal practice is 10 
arrange field work with interpenetration in respect of parties of investigators. Study 
of party and agency differences finds a place in our section on landholdings. There 
are also some marginal cases where even for the same agency the processing is done 
‘аб different centres, or by different tabulating units, or at different points of time. 


Т. It should, however, be pointed out that, strictly speaking, a complete 
absence of.a priori preferences is not always a reality. For example, if it is known 
' that a particular agency has had a long experience in a particular field then the results 
thrown up by it may be accepted to have higher validity. Or, when one survey is 
carried out by temporary ad hoc staff and another survey by a whole-time permanent 
statistical staff then some may be inclined to accept the results thrown up by the 
latter to have higher validity. For the same reason one may accept an intensively 
supervised well conducted sample survey by qualified, experienced and well-trained 
investigators to have higher validity compared to a complete enumeration conducted 
under usual census conditions. Examples of this situation will be found in our 
sections on spot-check of crop census records and sample verification of livestock 
census. 


8. One difficulty of evaluation is that the assessor may not be in possession 
of full background information about the agency, or about the conditions under 
which the census data, for example, may have been collected or processed. Some- 
times comparisons between census and sample check or those between interpenetrating 


3 2 This is the practice for the pre-harvest crop-acreage survey where the field work is conducted 
hy patwaris under the State authorities. The processing is done by three agencies each covering two 
»sub-samples, the agencies being (a) the State authorities, (b) the Directorate of National Sample Survey, 
Government of India, апа (с) the Indian Statistical Institute. 
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samples one by different agencies may provide corroborative evidence in support 
of certain ‘feelings’ based on previously available background information which 
may even be of a vague and inadequate nature. 

9. Tt may be worth-while stressing an obvious point which, however. is 
someting not kept in mind. For proper evaluation of results it is essential to take 
into consideration all available information, not merely the internal evidences supplied 
Ьу the survey alone; not merely the quantitative external evidences but also even 
vague поп-апаоінав и evidences. There is usually a residual element of subjective 
appreciation even when decisions are sought to be made in an objective way. The 
aim muss be to utilise the whole of the available evidence in sueh a way that а 
maximum amount of agreement can be reached among competent assessors. This 
is the basic approach of science. 

10. In some methods of evaluation the emphasis is on unitary check where 
the purpose is to evaluate the quality of data collected at the primary unit of enquiry. 
E valuation of results for small administrative units is also possible. These are 
illustrated by the spot-check of erop census records and sample verification of the 
livestock census. Such checks can supply not only material for the study of errors 
of ascertainment, unit by unit, but if performed on а random basis, may also provide 
a means for separation of total net error of an aggregate estimate into ascertainment 
error and errors of coverage and compilation. (See our section on livestock census). 


11. Another useful technique is to break up the survey period, for example, 
one round? of the National Sample Survey (NSS) into а number of sub-rounds, and 
to compare the estimates for each. sub-round. Differences in such sub-round esti- 
mates, if any, would give valuable information for proper interpretation of the data. 
survey plan where the work of the same agency can be 


This is an example of a 
under somewhat different conditions. 


evaluated against its own work conducted 
(See our section on population growth). 
12. The above idea of self-evaluation has 


2. 


been used in other forms, For 


example, in crop cutting experiments, to estimate the yield of crops per hectare, the 
crop is harvested separately, at each sample point, jn the form of two or three con- 
To what extent the work has been done under control can be 


centric sample cuts. 
itude of the divergence between different estimates, each 


then studied from the magn 
based on a different size of cut. 
13. A similar device which has been found to be of value, is using more than 


one reference period of time for the collection of data by the interview method. In 
ear) in such 


some designs the data аге collected for а long period (for example, one y! 
а way that tabulations by shorter reference periods (e.g. one month) are possible. 


Comparisons of results based on different periods of reference may reveal factors 


3 NSS is a multi-purpose survey, & Spec is covered in an integrated manner 
in a single ‘‘round” to be completed in а ified survey-period; different rounds have usually в different 


survey plan depending upon the group of subj 4s chosen and upon the relative emphasis placed on different 


јесі 
aspects of the survey- 


ified group of subjects 
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like ‘recall lapse.’ An illustration of this is provided in the survey design for the 
estimation of birth, death and growth rates. 


14. Another technique is to compare estimates obtained from one method 
of collecting the data against another more reliable method for estimating not the 
same character but a related one which sets a lower (or upper) limit to the former 
character. An illustration is provided in the comparison of death rates obtained by 
the interview method against a method of keeping a watch (by the method of re- 
enumeration) on a sample of individuals. Comparisons may also be made of results 
obtained by a less intensive enquiry against those based on a more intensive one. 
(See section on population growth). 


15. Ав the Indian National Sample Survey (NSS) is carrying out surveys 
round after round, it is possible to compare the results based on two or more rounds. 
‘Consistency’ over rounds adds to the confidence with which the results may be 
accepted. 

16. Another technique which appears to have good potentialities is com- 
parison over ‘space’ as distinguished from ‘time’ (or rounds) which we have just 
described. Illustrations of similarity of the nature of divergence from one area to 
another are to be found in our sections on landholdings and sample verification of 
livestock census. 


П. JUTE PRODUCTION IN BENGAL 1944-45 AND 1945-46 


17. It is not always that one gets an opportunity of evaluating the results 
obtained on the basis of complete enumeration and sample survey against a third, 
but extremely reliable, figure. Such an opportunity was availed of in two consecutive 
years in regard to the 1944-45 and 1945-46 Jute Crop of Bengal. 


18. Jute being a cash crop of international importance, accurate export 
trade figures are maintained and become available about 15 months after the harvest- 
ing season. Being a crop of such importance there is naturally a great demand for 
accurate statistics as early аз possible. The official forecast based upon plot-to-plot 
enumeration attempted to meet that demand. Sample surveys were also conducted 
by the Indian Statistical Institute. These were objective methods of enquiry where 
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а random sample of fields was taken up for actual physical examination for acreage 
estimation; and harvesting of a random sample of crop cuts followed by necessary 
weighments provided the yield rates. There were two interpenetrating sub-samples 
(IPNS) which were covered by different parties of investigators. 


19. Table 1 gives the relevant information. It will be noticed that in both 
the years the official forecasts based on complete count were both very much out 
whereas the sample survey estimates were quite close to the trade figures. The two 
sub-sample (IPNS) estimates in both years agreed with the trade estimates within 
roughly 3 per cent while the estimates based on the so-called complete count differed 
from the trade figures by 27.2 per cent in 1944-45 and 16.6 per cent іп 1945-46. 


TABLE 1. COMPARISON OF OFFICIAL (COMPLETE COUNT) AND SAMPLE SURVEY 
ESTIMATES OF JUTE PRODUCTION WITH TRADE FIGURES, BENGAL, 
1944-45 AND 1945-46 " 


quantity (thousand bales) 


particulars about jute crop 
1944.45 1945-56 


(1) (2) (8) 


1. consumption during the season s 
l.l in jute mills (actual) 6000 6308 
1.2 exports (actual) 1050 9213 
1.3 іп villages (estimate) 600 600. 
Nu total 7650 9121 
Oe жақс” е т е РЕС АВ LL сенетш сч 
3. consumed from previous year's stock 324 697 
4, jute crop in other provinces 598 862 
5. balance: Bengal crop, trade figures 6728 7562 
6. complete count; Bengal crop, official forecast 4895 6304 
7. sample survey: Bengal crop, Indian Statistical Institute 
7.1 sub-sample 10 6836 7734 
7.2 sub-sample 20 - 6518 7773 
7.3 full sample 6686 7755 
=27.2% -16.8% 


8. discrepancy of (6) on (5) 
9. discrepancy of 


9.1 (7.1) on (5) | 
9.2 7.2) on (5 2202 5400-50, 
9.3 (r3) A (5) — 0.6% + 2.6% 


а ] bale = 400 pounds, 1 maund = 82.2857 pounds; | 
Trade figures аге reported іп bales but sample survey carried out on the basis of maunds which 


have been converted into pounds and then to bales, (the approximation 1 bale = 5 maunds sometimes 
used would give somewhat lower figures). 


> Sub-sample estimates of producti 
of erop acreage by the full sample yield rates at district level, (IPNS estimates 


оп have been obtained by multiplying sub-sample estimates 
of yield rates being not 


easily available now). 


Р CENSUS RECORDS, 1937, 1949-50 AND 1950-51 


20. For a very large portion of the cadastrally surveyed area of the country 
the statistics of acreage under important crop are obtained every crop season by 
the census method. The data are not collected by interviewing the cultivators. 
Actual visits to all the fields in a village are to be made; and therefore the data should 
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be relatively free from sizeable ascertainment errors, especially, when these are E 
collected by the райга, a village official, who is well informed about the crops grown ` 
in the locality. 


21. The patwari, however, is a land revenue official and the стор records are 
primarily maintained for this purpose; and it is pertinent to enquire whether or not 
the data fulfil the more exacting requirements of crop statistics. He is moreover 
burdened heavily with multifarious duties and therefore actual visits to the fields 
may not always be possible, : 


22. А spot-check of the census records was therefore undertaken in the Rabi 
Season, 1949-50, the work being under the control of the Department of Economie 
Affairs of the Ministry of Finance. Тһе Government secured the services of a very 
senior officer with more than 20 years experience in land revenue work. Moreover, 


this officer had himself set up a crop census organisation in one of the States. 


23. A batch of specially experienced investigators made visits to a sample 
of fields in the presence of the patwaris, and after securing their agreement noted the 
actual utilisation of the fields (whose areas were accurately known); the corresponding 
entries in the census records were also noted. А similar spot-check was again con- 
ducted during the Rabi Season of 1950-51. А summary of the findings in respect of 
wheat, barley, gram, arhar, mator, mustard and linseed is given in Table 2. 


TABLE 2. COMPARISON OF CROP ACREAGES AS REPORTED BY PATWARIS AND CHECKERS, 
1949-50 AND 1950.51 


discrepaney as 


acreage sum of % of col. (5) 
name of number numberof 
crops of com- positive negative " 
villages parisons? patwari checker discre- discre- absolute algebraic 
paneies paneies 
(1) (2) (3) (4) (5) (6) (7) (8) (9) 
oon Ес уын ДА 
1. 1949-50 
1.1 wheat 391 8097 3492.0 3030.9 965.8 -504.7 49 15 
1.2 barley 301 4144 1185.7 1117.8 468.5 400.5 78 6 
1.3 gram 391 6519 3010.3 3027.1 866.1 —883.0 58 = 
1.4 arhar 391 1120 298.9 494.1 57.9  —253.2 63 —40 
1.5 mator 391 1340 195.7 260.4 54.9  —119.0 67 —25 
1.6 mustard 391 3071 293.7 693.6 115.8. —515.8 91 —58 
1.7 linseed 391 1145 134.0 269.2 59.3  —194.6 94 -50 
Е 2. 1950-51 

2.1 wheat 167 1170 1005.2 863.6 205.3 - 83.7 31 16 
2.2 barley 187 277 91.7 81.3 23.6 - — 13:1 45 13 * 
2.3 gram 167 1819 1739.0 1301.0 602.8 -164.7 59 34 
He MEET 610 183.9 29554 82.7  —154.9 93 —28 
rl mator 167 357 105.5 189.9 15.6 —100.0 61 = 
$5 mustard 187 247 58.1 38.0 26.3 - 6.2 85 53 

Л. linseed 167 130 174.1 26.7 151: Е 580 ` 552 


. Crops in mixture allocated to components in 1949-50, but completely excluded in 1950-51. 
not include where neither patwari nor checker reports the. specified crop. 
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24. It should be pointed out at the outset that the 1949-50 results are not 
comparable with those of 1950-51 as mixed crops were excluded in the later year 
but not in the previous one. There are also differences in geographical coverage, 
and in the method of selection of sample villages and sample. fields (plots). In the 
earlier check the choice of the villages was conditioned by the selection of patwaris 
made by their supervising kanwngos and also by the avoidance of villages where 
harvesting had already started. Also in the selection of the plots within a sample 
village attempt was made to cast the sample haphazardly so as to cover all portions 
of the village. In the later spot-check both the villages and the fields were selected 
at random. However, a substantial number of these villages could not be covered 
by spot-check as the patwari could not be contacted, or even when contacted, he 
had not completed the crop records due to other preoccupations like the population 
or the livestock census. There were also some villages which did not fall under 
the jurisdiction of any patwari. ars 


25. Growing of crops in mixture is fairly common during the Rabi season. 
In the case of fields with mixed crops, the proportion of the field under mixed crop 
was shown separately. Also the acreages under the different constituents were 
generally: separately recorded on the basis of eye-estimation, However, for the 
1949-50 data there were a few cases where this allocation was not recorded and the 
total area for these cases was divided equally among the constituent crops at the 
tabulation stage. For the 1950-51 data this question did not arise because the 
analysis was restricted to crops grown singly. The 1949-50 comparisons as given 
in the Table show large absolute discrepancies ranging from 49 per cent for wheat 
to 94 per cent for linseed. Part of the discrepancy is undoubtedly due to rather 
defective? instructions to the patwari. For example, he is asked to ignore all minor 
constituents and report the entire area under the major component. The effect 
of this for a crop like linseed, which is known to be grown extensively as a minor 
component in a mixture, is obvious. Although as a component it may be minor 
its total contribution is very large compared to the acreage under pure (unmixed) 
linseed. The large negative algebraic discrepancy (—50%) for 1949-50 must ‘be, 
at least partly, due to this factor. The position is similar with mustard. Some 
adjustments for the above shortcomings ате reported to be made before the publica- 
tion of official acreage statistics; obviously, the basis for such cannot be the conditions 
obtaining in the year for which the statistics are reported. The situation calls for 
drastic steps and not mere adjustments. 


26. А second source of discrepancy for which all patwaris may not be fully 


5 In relation to agricultural statistics, but presumably not so for land revenue purposes. 
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the area to different components is made at the level of district each comprising 
about 2000 villages on the average. This means that the total district area under 
a particular type of crop mixture is allocated to the components according to the 
ratio® fixed long ago, usually at the time of last settlement of land revenue, such 
settlement operations taking place at intervals of 20 or 30 years. This is obviously 
unsatisfactory. (Incidentally the problem of completely satisfactory procedure 
of allocation of mixed crops is still awaiting solution). j 

27. Тһе magnitude of the discrepancies observed in 1949-50, however, ік so 
large that it is hardly possible to explain this to be solely due to causes for which 
the patwari is not directly responsible. То obtain a clearer grasp of the role played 
by the patwari the 1950-51 data were analysed only for plots (and sub-plots) for 
which erops in intermixture were not reported. For such comparisons the discre- 
pancies must be attributable to patwari performance. There is unmistakable evidence 
that the census records were far from accurate; the absolute discrepancy ranged 
from 31 per cent for wheat to 580 per cent for linseed. 

28. Тһе 1949-50 spot-check was not on a random sampling basis, and the 
1950-51 check, although planned on a random basis, could not be executed as such; 
and therefore it is strictly not possible to assess the net effect of the ascertainment 
error on the estimation of total acreage under a crop. The random components of 
the ascertainment errors of individual units may balance to some extent on aggrega- 
tion. To study the extent of this balancing we are presenting below summary results 
of one of our older studies conducted in 1937. 

29. А compact area, called thana, 78 square miles in area, (comprised of 10 
‘units’ covering a total of 108 mauzas or villages) was completely covered by a census; 
‘the acreage under jute was recorded plot by plot, for more than one-sixth of a 
million fields. А sample of 11 mauzas was selected at random, and a second com- 
plete enumeration was independently carried out in these mauzas by a different set 
of investigators. Again a systematie sample of every 20th plot with a random 
starting point was independently enumerated in all the mauzas. (There were other 
types of re-enumeration which need not be stated here). Тһе survey was conducted 
by the Director of Agriculture, Bengal, and the statistical analysis of the data was 
undertaken by the Indian Statistical Institute. 


30. Тһе average discrepancy between the jute acreages according to the 
two enumerations are given below for the different units of increasing average Sizes, 
namely, plot (0.29 acre), village (462 acres), union (7.8 sq. miles) and thana (78 sq. 
miles). At the plot level the average discrepancy is about 54 per cent of the average 
jute aereage (according to census). Тһе percentage discrepancy varied considerably 
from one sample village to another—the median value is 26 per сепб and the five 
middle most values ranged from 12 per cent to 33 per cent. Ав far as can be judged 
from comparison of duplicated observations on an average of about 869 plots per 
poU M MA Su lt sr 


6 АП these ratios were not (and still are not, in spite of best efforts) known to us at the time of 


analysis of data, 
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union the median percentage discrepancy at the union level comes to about 30 
per cent; and for the thana the percentage discrepancy on the basis of 8586 plots 
(random systematic) works out to nearly 18 per cent. 

31, From the magnitude of the discrepancies noted above it appears that à 
census may not provide accurate estimates for small administrative areas. The 
random component of the non-sampling error may balance considerably for very 
large administrative areas with the result that for these large areas the results may 
not differ appreciably from those obtained by a carefully conducted sample survey 
if the bias component is comparatively small. But care should be taken not to inter- 
pret that what holds good for a large area also holds good for each (or majority) of 
its constituent (small) administrative units. 


IV. SAMPLE VERIFICATION OF LIVESTOCK CENSUS, 1956 

32. In most of the States of the Indian Union, the eighth quinquennial 
livestock census was conducted during February to April 1956 and the data were 
subsequently brought up to the reference date of 15 April 1956, The enumeration 
was done by village officials like patwaris. The collection and compilation of these 
statisties were the responsibility of the State authorities. The Central Government 
in the Ministry of Food and Agriculture desired an independent sample verification 
by a different agency, and, as а consequence, the National Sample Survey (NSS) 
Organisation undertook this work during June-July 1956. 

33. For rural areas 1624 sample villages, selected from among those just 
covered for socio-economic enquiries in the tenth round of NSS were taken up. 
Similarly a sample of 340 urban blocks (1951 Population Census Enumeration dis- 
tricts) were selected. For detailed enquiry 20 households were selected at random 
from each of the sample villages; and in urban blocks all the households were covered. 
In the sample units all non-household livestock establishments, which were very 
few in number, were also covered. i 

34. Data were collected about the livestock in possession of a household on 
the date of survey together with information about changes since 10 April 1956, 
so that the number as on the census date of reference could be obtained. The census 
registers were consulted only after the collection of these data, and census entries 
corresponding to the sample households were copied on the sample verification 
schedule, This was done in the sample village or urban area itself in order to mini- 
mise ‘matching’ difficulties. 

35. It must be pointed out that the NSS investigators (enumerators) are 
wholetime quasi-permanent staff employed on a continuing basis, and are especially 
trained and experienced in the collection of data (including those on livestock) by 
the interview method, and their work is under intensive supervision. The data 
thrown up by them may therefore be considered to be of higher validity, and the 
census results may be evaluated against the results of the sample verification ав 


standard. 
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36. Before passing on to the actual comparison of the results as shown in 
Table 3 we may refer to certain general features. Household by household enquiry 
revealed that in nearly one-fourth of the households having livestock the census 
enumerator did not make an aetual visit. This fraction varied considerably from 
State to State. It did not however mean that the census enumerator made completely 
arbitrary entries for these households; he might have ascertained from neighbours, 
or, being a local man, the recording might even have been from his personal know- 
ledge. SE 
37. It was not always possible for the NSS investigators to secure the census 
figures for the sample households for various reasons. For example,.the village 
might not have been covered in the census (at least up to the sample verification 
time), ‘non-availability’ of census records in the village (which may either mean 
that census had not taken place, or that for some reasons the records had bzen re- 
moved to some other place), ‘clubbing’ together two or more households in the census 
records, etc. , Cases of omission to record the census entry against the verification 
figure might also have been caused by ‘matching’ difficulties, but such a contingency 
is likely to be very rare because the census enumerator (a local person usually the 
head-man or village patwari) from whom the census records were collected must 
have helped in identifying a sample-household in his records unless, of course, when 
it was a case of omission on his part. In about 9.43 per cent of the rural and 5.61 
per cent of urban sample households census figures were not recorded. A careful 
scrutiny of the investigators’ ‘remarks’ showed that there were very strong grounds 
to conclude that some (at least one-eighth) of these-cases were due to census omission. 


38. We now pass on to the comparison of sample verification estimates 
with the census figures. We restrict ourselves to rural areas which account for by 
far the larger proportion of total livestock. Table 3 gives the necessary information 
not only for all-India but also for the major States, Manipur, Orissa and West 
Bengal are omitted because no census took place hefore the sample verification. 


39. Tt will be noticed that there are highly significant differences in the all- 
India estimates of total cattle and total buffaloes, There 45 Serious under-estimation 
in census figures, an increase of about 15 per cent is needed to bring up the census 
figure to the level of sample verification estimates. At the State level also there are 
significant differences. 


40. Another set of estimates called the census-sample (es) estimates has also 
been obtained in order to have some idea of the effect of different sources of error. 
Table 4 gives the necessary information for all-India (rural) in respect of total animals, 
total males, total females, working bullocks, and cows in milk, separately for cattle 
апа buffaloes, The sample verification (sv), the census-sample (св) and census (с). 
estimates are all shown in this Table. 


au E а. In making the census-sample estimates we have ‘made complementary 
Mie of сака as well as survey data. From the census we have taken the counts 
of head of livestock for the sample households, and from the survey we have taken 
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the counts of the number of households in the sample villages. In view of the fact 
that for some of the sample households census counts were not recorded for reasons 
explained earlier we proceeded as if those households were really taken into con- 
sideration by the census enumerator and assumed further that the average characte- 

‘tistics of those households were the same as those for which census data were available. 
(The latter assumption holds broadly for sample-verification data). 


TABLE 4. COMPARISON BETWEEN SAMPLE VERIFICATION (sv), CENSUS-SAMPLE (cs) 
AND CENSUS (c) ESTIMATES: 1956 


Rural sector : All-Indiac 


number of sample villages : 1334 number of sample households : 27720 
——M— Á——M—— — MM са 
a adjustment factor 
livestock 3 sample census 52 
category verification? sample? | census) total coverage- enumera- 
(sb) (св) (с) sv/e compilation tion 
2 ев/е ву/св 
(DAA (2) (3) (4) (5) (6) (7) 
1. cattle 
1.1 total cattle 154.9 148.8 134.3 1.152 1.108 1.040 
1.2 total males 86.2 83.2 73.8 1.168 1.126 1.037 
1.3 total females 68.6 65.7 60.4 1.136 1.087 1.045 
1.4 working bullocks 61.8 60.3 53.2 1.162 1.135 1.024 
1.5 cows in milk 15.9 1726 17.1 0.930 1.038 0.896 
2. buffaloes 

3 2.1 total buffaloes ` 46.8 44.5 41.1 1.138 1.082 1.052 
2,2 total males 12.0 11.2 10.5- , 1.186 1.068 1.064 
2,3 total females 5 34.8 33.2 30.6 1.139 1.087 1.048 
2.4 working bullocks 5.2 4.9 4.9 1.053 1.002 1.051 
2.5 cows in milk 10.1 11.3 10.8 0.932 1.046 0.891 


4 Excludes Manipur, Orissa and West Bengal. 
5 Figures in millions. 


42. The threo estimates are subject to errors from various sources which may 
be broadly divided into the following classes: (1) coverage error—omission (or 
duplication) of villages or blocks, (2) listing error—omission (or wrong inclusion) of 
households in selected villages or blocks, (3) errors in enumeration (and categorisa- - 
tion) of livestock for the households concerned, (4) errors in compilation and editing 
ебе. The sample-verification and census-sample estimates are in addition subject 
to sampling error. The listing error may be regarded as coverage error excepting 
in the situation described below. In this census there is reason to believe that in 
some instances the enumerators deliberately omitted to record households without 

livestock. То the extent this is done correctly the failure to list does not contribute 
anything towards’ coverage error. _ When, however, non-possession of livestock 
happened to be a wrong assumption such omission should strictly be regarded as 
enumeration error. / : 
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43. It is believed that sample-verification (sv) estimates are subject to 
comparatively small amount of errors of the above type. We may take the sample 
verification estimate as standard against which to compare the other estimates. 


44. It is clear from the method of obtaining the census-sample (cs) estimates 
that these are at par with sample-verification (sv) estimates, excepting for errors of 
enumeration of livestock and therefore the ratio sample-verification estimate/census- 
sample estimate or sv[c may be taken as an index of (or an adjustment factor for) 
the net census-enumeration error. The last column in Table 5 gives estimates of. 
enumeration adjustment factor. 


45. Again one would expect agreement (excepting to the extent sampling 
error comes into play) between the census (c) estimates and census sample (os) esti- 
mates if census figures were relatively free from coverage, compilation (including 
editing) errors. We can therefore use the ratio (census-sample estimate) /(census 
estimates) or св/е as an index of (or adjustment factor for) net coverage-compilation 
error. The cóverage-compilation adjustment factor ranges from 1.038 to. 1.135 for 
the cattle categories noted in Table 4. The corresponding range for ‘buffaloes ін” 
1.002 to 1.087. Generally the coverage-compilation error affects the aggregate figures 
more seriously than enumeration errors. “жап j ; ; 


46. An index or adjustment factor fot the total error ін given by the ratio 
sample-verification estimate/census estimate, or ву/е. It will be noticed that we 
have defined the adjustment factors in such a manner that total adjustment factor 
is equal to the product of the Е -compilation adjustment my the кше нып Е 


adjustment factor. 


47, Leaving apart cows in milk for the present the total adjustment factor is 5 
around 1.15 for the various cattle categories, and nearly 1.14 for the various buffaloe . 
categories (excepting po bullocks for which it is 1.05). | ү 


48. But for cows in milk, in spite of the fact that coverage- -bormpilation : 
adjustment factor is greater than unity, there is so much overenumeration in census 
that the total adjustment factor becomes Jess than unity. It is to be remembered, . 
however, that placement in the category cows in milk is likely to be subject to con- 


siderable ascertainment error specially when the cow is nearing the end of its lactation > 


period. Moreover, there is some amount of seasonal variation which might have 


added to the difficulty of ascertaining the position as on the census date; (both the ' 
census and the sample survey Were spread over a fairly long period before and after : 
this date respectively). Ў p 

49. Тһе total adjustment factors are РЕНЕ subject to errors of sampling. 


We have estimated these for the all-India categories, total cattle and total buffaloes,— 
the standard errors are 1.92 per cent | and 2.36 per cent of the respective adjustment 
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factors. For enumeration adjustment factors the percentage standard errors are 


roughly halt of those shown above. 


-b0. All the adjustment factors have also been caleulated for the rural areas 
of 12 major States. Тһе census took place in Orissa and West Bengal subse juent 
to the sample verification. For these States it was possible to calculate only the 


total adjustment factor. We have presented in Table 5 a consolidated picture of 
the situation. Since the sample size for most of the States was comparatively small 
not much significance сап be attached to individual adjustment factors. But taken 


together they reveal significant tendencies. 


51. It will be noticed that except for cows in milk there is general tendency 
for adjustment factors to be greater than unity. This tendency is more marked in 
case of cattle. Although no great significance need be attached to specific cases 
one may conclude that there are in fact a few States where the actual number in 


certain categories exceeds the census number by 25 per cent or more. 


52. In order to obtain a more detailed case by case picture of enumeration 
error we have obtained a few two-way tables showing the distribution, of all-India 
(rural) sample households by number of head of livestock as reported by the NSS 
investigator and as recorded by the census enumerator. The table for total cattle 
would have been preferred but to save space we are presenting, by way of illustration 


of general features, Table 6 for working bullocks (cattle) only. 


53. Out of nearly twenty-eight thousand households covered by NSS the 
comparison was possible for 91 per cent of cases. Out of these comparable cases 
in 38 per cent of the households there was no cattle according to NSS. But in nearly 
5 per cent of these no-cattle households, the census recorded one or more head of 
cattle, the average being more than two. Among those with cattle, the census 
recorded complete absence in about 5 per cent of cases, and in two-thirds of former 
households there was complete agreement between the two agencies. In spite of this 
agreement the average discrepancy between the census and NSS entries was con- 
siderable in relation to the number of head of cattle possessed by a household. In 
fact, the standard deviation of discrepancy was as much as 56 per cent of the average 
number of head of cattle per household, The picture for working bullocks (cattle) 

. is very similar, the corresponding standard deviation being of the same order. The 
97088 enumeration-error was therefore very high. 


54. We now come to two important points which are sometimes made in 
favour of census as against sample survey. First, the census can provide reliable 
counts for small administrative units (say, villages, each with 100 households on the 
average), and secondly, accurate measurement of change between two fairly close 
points of time are furnished by the census. It is obvious from the nature and magni- 
tude of the enumeration and coverage compilation errors that the livestock census has 
hardly satisfied these objectives. 
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EVALUATION ОҒ STATISTICAL SURVEYS 


V. SURVEY or LANDHOLDINGS, 1954-50 


ту 


55. In the éighth round, 1954-55 of the National Sample Survey we had for 
the first time an opportunity to organise the field work іп a completely. independent 
interpenetrating system. The entire sample for a State was broken up generally into 
12 independently drawn sub-samples, and 4 of these were covered by the Central 
agency and the remaining 8 by the State agency. In a few States there were addi- 
tional sub-samples which were not taken into account in any of the Tables and 
Charts excepting Table 11 and Chart 3. The data thrown up opened the possibility 
of examining the relative biases of the two agencies. It should be pointed out here 
that the Central agency (Directorate of NSS) is a quasi-permanent agency purely 
for statistical surveys, the investigators (enumerators) being whole-time employees 
with considerable experience of data collection by the interview method. The State 
agencies participated on an ad-hoc basis, and employed their normal staff, usually 
of the Land Revenue and Agricultural Departments, for this purpose. An enquiry 
into landholdings and several other socio-economic enquiries were taken up simul- 
taneously by the Central agency but the State agencies took up the first enquiry only. 

ж 


56. It шау be pointed out that even when observed differences are ‘statisti- 
cally significant’, whether this is of any practical importance or not requires to be 
judged against the ‘permissible error’ for the purpose in view. The present enquiry, 
for example, was made primarily to collect information to decide broad policies of land 
redistribution; for such purposes the concentration curves (shown in Charts 1, 2, and 
3) can supply very useful information. The sample holdings were arranged in 
ascending order of size and accumulated; and the estimated cumulative percentage 
of holdings is shown on the horizontal scale, the estimated cumulative percentage 
of the area under holdings is shown on the vertical scale. In Chart 1, which relates 
to ‘household ownership holding’ three concentration curves are shown separately, 
one for the Central sample, and two for State samples for which the information 


was collected respectively by party 1 and party 2 of investigators. Chart 2 is 
similar to Chart 1 but relates to ‘household operational holding,’ (which is defined. 
as ‘area owned’ plus ‘area leased in’ minus ‘area leased out’ by household) In 
Chart 3, the two State samples have been pooled together; and only two concentra- 
tion curves are shown respectively for the Central and the State sample for ‘agri- 
cultural holding’ in the upper pair of curve, and for ‘(total) operational holding’ 
(which is constituted of all land under one distinct technical and economie unit so 
that in certain cases more than one household may be associated with the same ‘total 
operational holding’.and the same household may be associated with more than one 
‘(total) operational holding’) in the lower pair of curves. The real issue is whether 
the policy decision would remain the same by using either of the two concentration. 
curves based respectively on the Central and State samples. Tn the i id: case, 
the divergence between the Central and the State concentration curves 18 80 small 
that policy decisions would remain practically the same whether the Central or the 
5 341. 
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EVALUATION OF STATISTICAL SURVEYS 


State concentration curve is used; the divergence between the two (Central and 
State) concentration curves may therefore be considered to be well within the limits 
of permissible error. It may be added that the converse case may also arise. The 
observed divergence between concentration curves or other results based on two 
sub-samples of an interpenetrating network of samples may -not be statistically 


LANDHOLDINGS ENQUIRY, 1954-55 RURAL SECTOR ; ALL-INDIA 
оо 2 


CENTRAL SAMPLE - 
(number of villages. 


STATE SAMPLE —-—-—- - 
«number of villages-3021; number of households-5i354) 


з number of households-24366) 


80 


60 


40 


20 


CUMULATIVE PERCENTAGE OF AREA UNDER HOLDINGS 


о 720 — 40. — 60 -80 100 
CUMULATIVE PERCENTAGE OF HOLDINGS 


CHART (3): CONCENTRATION CURVES FOR AGRICULTURAL HOLDINGS 
AND (TOTAL) OPERATIONAL HOLDINGS 4 


significant and yet һе wider than the limits of ‘permissible error’ for any given Duo 
The statistically ‘significant’ differences which will be found in some gone in the 
following paragraphs, while indieating some lack of control, should be judged, how- 


ever, in the light of the observations made here. . 


57. We аге giving in Table 7 the all-India (rural) estimates provided by the 
Central апа State agencies for some basic characters. The corresponding standard 
errors are also shown. For every State separate estimates were obtained for euch 
of the sub-samples; these facilitated the estimation of the standard. errore 16 will 
be noticed that the Central estimate is significantly larger than the State estimate in 


1 ? tad i ld | D. "between" the other two curves is not shown 
1 The ‘pooled’ concentration curve which would 1 dece RR 


i i isi om very close superimposition of 
in any of the charts in order to ayoid confusion arising fr ту 
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respect of ‘number of households’? as well as the ‘acreage operated’. In case of 
‘area owned’ although the former estimate is larger the difference is not stat istically 
significant. We arrive at the same conclusion by а simpler method to be explained 
immediately. 
TABLE 7. COMPARISON OF CENTRAL AND STATE ESTIMATES FOR SOME BASIC 
CHARACTERS : LANDHOLDINGS ENQUIRY, 1954-55 
Rural sector: All-India 


number of sample villages : central sample—1410; state sample—2805 
number of sample households : central sample— 24366; state sample—47432 
central state difference 
character std. diff.] 
5 estimate? std. estimate std. actuale p.e. errorof std. 
errore errore (4)-(2) (6)/(2) аі, (6) error 
% 
(1) (2) (3) (4) (5) (6) (7) (8) (9) 
1. number of households 64.863 0.848 602.892 0.504 1.971 3.0% 0.986 2.00* 
2. total acreage operated 
by rural households 348.151 7.553 330.402 4.934 17.749 5.1% 9.022 1.97* 
3. total acreage owned by ы 
- rural households 307.037 7.148 304.992 4.780 2.045 0.7% 8.599 0.24 


* Significant at 5 per cent level, 
4 Figures in millions, 


58. There are other characters for which comparisons between Central and 
State estimates are possible. For some of these comparisons simpler methods have 
been adopted. Тһе general nature of the discrepancy between the Central and State 
estimates may be surmised by considering (1) the variation of the sub-sample estimates, 
and (2) the consistency, if any, in the nature of the set of discrepancies for a group 
of territorial divisions. 


59. We have in Tables 8 and 9 provided all the 12 sub-sample all-India 
(rural) estimates, 4 for the Central and 8 for the States—for five classes of characters, 
namely, (1) aggregates, (2) average size, (3) rates рег capita, in Table 8; and 4 dis- 
tribution of households by size of holding, and (5) the area operated for each holding 
size class, in Table 9. 


: 60. It may be pointed out incidentally that a few basic sub-sample estimates 
provide a simple means of obtaining sub-sample estimates of derived statistics, and 
thus the simple t-test which we have applied to the basic characters is also easily 
applied to the derived statistics. Thus we have given the results for the derived 
characters—total acreage leased in, the average sizes, and the rates per capita. 
Although not shown here; one can test out, from what is given, the significance of 
Hie difference in respect of average size for each holding size class. 


to 1951 К Sun verde and State samples were drawn quite independently, with probability. proportional 
eae poe lation, and pase were a few, usually large, common villages, (villages were the first stage 
igne M ciem eis village Ore ge listed independently before the-sample households were 
“The Cóntral quiry. A direct comparison of the exact number enumerated is therefore possible. 
agency recorded on the average 11 households against 10 recorded by the State agencies. 
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61. It will be noticed from Tables 8 and 9 that the Central and State esti- 
mates are not always in agreement with each other. Tt is found that the State 
sample provides lower estimates of aggregates compared to the Central sample; 
There appears to be significant differences in regard to the estimation of ‘total number 
of households’, ‘total number of operational holdings’, ‘total area operated’ and | 
‘total (net) area leased in’ by all rural households from urban households. For these 
aggregates the State sample under-estimates by 3.4 per cent, 7.4 per cent 6.0 per 
cent and 37.6 per cent respectively. ` 


62. It is of interest to note that the State sample provides a significantly 
lower estimate of ‘total number of households’, but the ‘average household size’ for 
that sample is significantly higher compared to the Central sample, and the product 
of the two, namely, ‘total population’ is not significantly different. It is difficult 
to interpret the result with confidence. One of the possibilities is that the State 
sample failed to cover fully the very small households, and the contribution of those 
small households to the total population size is small compared to the sampling 
fluctuations so that the above test failed to detect the discrepancy. Another possi- 
bility is that the State agencies in preparing the list of households in the sample 
villages leaned more on the list of cultivators which they had for purposes of land 
revenue, and moreover, they may not have always taken into account any recent 
partitioning of households which might really have taken place, if recording of such 
information did not make any material difference in regard to colleetion of land 
revenue. k 
` 63. 1018 only logical to compare the characteristics for each State separately, 
because agencies differed from State to State. It is to be noted, however, that the 
reduction of the sample size at the State level may make the detection of differences 

hand be pointed out that when pooled to all- 


more difficult. It may on the other : | 
India level the State bias (and errors) will balance to some extent and this may also 


make detection at all-India level difficult, even though sampling егтог may be reduced. 
We have therefore found it desirable to present a summary picture separately for oe 
more important States also. For this purpose we have musa the ҰЛЫ Ны E. 
capita area operated’ the Central and State estimates of which do "n differ significantly 
at the all-India level. We һауе also chosen for further examination the рлар кта» 
isties, ‘number of households’, ‘number of operational holdings’, iota area operated 
and ‘average household size’, for which the two estimates differ significantly. 


Table 10 that in majority of the States the sign of 
the difference between the Central and State estimates are the ваше аз at е all-India 
level (shown in an earlier Table) for characteristics for which the с, ? 
significant at the all-India level. Thus out of 14 comparisons the similarity holds 


1 } for ‘total number of opera- 
in 10 cases for ‘total number, of households’, 11 cases 


tional holdings’, 12 cases for ‘total area operated’, and 10 cases for ‘average house- 


hold size.’ It is interesting to note that although the ‘per capita area operated’ 
Ne do not differ significantly at the all-India level, yet in as many ав 13 eases 
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the Central estimate exceeds the State estimates. This is, of course, a significant 
result. 

65. Going back to the earlier Table 9 we note the sub-sample estimates of 
the distribution of households by the six size-groups of household operational hold- 
ings; (the sizes were obtained correct to two places of decimals, so that 0.00 means 
less than .005 acres)—the size classes being 0.00, 0.01-0.09, 0.10-0.99, 1.00—4.99, 
5.00-9.99 and 10.00 & above. The corresponding acreages are also shown for each 
group (excepting for 0.00). It will be seen that the differences are significant for all 
the groups excepting 0.10-0.99 and 10.00 & above. There is considerable difference 
between the two estimates for the two lowest size classes, 0.00 and 0.01-0.09. (The 
State sample has registered a very large proportion in the 0.00 class). But the 
difference ceases to be significant when the two lowest size classes are merged together. 
Tt is difficult to offer any very convineing explanation of the phenomenon without 
further probing analysis. It is not known whether the State investigators were 
inclined to record more approximate figures (so that rounding off to 0.00 would be 
more frequent), or whether thinking that the main purpose of the enquiry is to 
collect information about agricultural holdings they have not paid adequate attention 
to recording areas under house-site. 


66. Some of the total landholdings may be used exclusively for non-agri- 
cultural purposes. Excluding these we obtain holdings each of which is wholly or 
partly put to agricultural use. Zonal® distribution of such agricultural holdings is 
shown in Table 11. 


67. It will be noticed that relative to the Central sample the State sample 
under-estimates the number of such holdings (by 9.57 per cent). This feature is 
to be found in all the six zones. We have previously seen that taking total land- 
holdings (which are comprised of all agricultural as well as non-agricultural lands) 
we have a similar phenomenon (under-estimation by 4.33 per cent) and it is pertinent 
to enquire whether or not the position regarding agricultural holdings is merely a 
reflection of the other phenomenon about total landholdings. It will be noticed 
from Table 11 that not only the State sample under-estimates the (total) operational 
holdings but even the proportion of agricultural holdings to (total) operational hold- 
ings is practically uniformly lower in the State sample. This may happen if the 
State sample behaves in the following manner—(1) larger omission of agricultural 
holdings, possibly the smaller ones, and/or (2) undue failure to record any agricultural 
utilisation, again, possibly in some of the smaller holdings. An examination of the 
data (not reproduced here) shows that іп the lower size-classes the State sample registers 
a lower percentage of agricultural holdings. It will also be noticed in Table 11 that 
the average size of agricultural holding is greater in the State sample— possibly because 
of omissions of some of the smaller holdings. 


68. The survey design described at the beginning of this section is incomplete 


m one respect. This is in regard to the assignment of different sub-samples to 
Ны Ы. 


9 In the 1951 Population Census the States were suitably grouped to form six Population Zones. 
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different parties of investigators. It will be recalled that there were four sub-samples 
in the Central sample; two of them were covered by the first Central party and the 
remaining by the second Central party. There were similarly two parties of State 
investigators. In our earlier discussions we have ignored the question of party 
differences. 


TABLE 12, ANALYSIS OF VARIANCE OF AVERAGE AREA OWNED PER HOUSEHOLD; 
LANDHOLDINGS ENQUIRY, 1954-55 


EC 


sum of squares mean square P 
state В 
agency party error agency arty error арепс: art; 
4.].=1 4./.=2 d.f.—8 p ae, ae 
(1) (2) (3) (4) (5) (6) (7) (8) (9) 
1. Uttar Pradesh 0.252 0.176 0.150 0.252 0.088 0.019 13.26** 4.63* 
2. Bihar 0.004 0.246 0.919 0.004 0.123 0.115 28.75(r) 1.07 
3. Orissa 0.001 0.381 1.864 0.001 0.190 0.233 233.00(г) 1.23(r) 
4. West Bengal 0.583 0.249 0.998 0.583 0.124 0.125 4.66 1.01(r) 
5. Assam 1.480 0.050 17.776 1.480 0.025 2.222 1.50(г) 88.88*(r) 
6. Andhra 0.064 0.381 2.740 0.064 0.190 0.342 5.34(г) 1.80(г) 
7. Madras 0.059 0.080 1.230 0.059 0.040 0.154 , 2.01(r) ` 3.85(r) 
8. Mysore 0.254 0.0016 7.281 0.254 0.0008 0.910 3.58(r) 1137.5**(r) 
9. Travancore & Cochin 0.441 0.681 2.395 0.441 0.316 0.299 1.47 1.06 
10. Bombay 0.175 0.919 3.778 0.175 0.400 0.472 2.70(r) 1.03(r) 
11. Saurashtra 15.072 5.941 71.547 15.072 2.970 8.943 1.69 3.01(r) 
12. Madhya Pradesh 0.037 1.673 4.935 0.037 0.836 0.617 16.68(г) 1.35 
13. Madhya Bharat 0.134 3.266 9.278 0.194 1.633 1.160 8.66(г) 1.41 
14. Hyderabad 4.673 2.422 12.922 4.073 1.211 1.615 2.89 1.33(r) 
15. Vindhya Pradesh 6.120 1.703 16.394 6.120 0.852 2.049 2,99 2.40(r) 
16. Rajasthan 1.649 4.819 45.394 1.649 2.410 5.674 3.44 (г) 2.35(r) 
17. Punjab 1.397 3.945 9.973 1.397 1.972 1.247 1.12 1.58 
18. PEPSU 0.771 4.990 25.181 0.771 2.495 3.148 4.08(r) 1.26(r) 
19. Jammu & Kashmir 0.0004 0.083 1.334 0.0004 0.042 0.107 417.50%т) 3.98(r) 
Ғ.(1,8)--5.99. . Р.о1 (1,8) —11.26. Е.о (8, 1) —238.9. Р.01(8, 1) 5982. 
Е.05(2,8) —4.40. F.o, (2,8) —8.65. Ғ.о (8, 2)—19.37. Ё.о1(8, 2) =99.37, 
(r) indieates error/party Ог өггог/аделеу in F-ratio. | 
* Significant at 5 per cent level. ** Significant at 1 per cent level. 


69. In order to test whether there is any significant difference between the 
party estimates (and between Central and State estimates) the technique of analysis 
of variance has been applied on the character ‘area owned per household’ as estimated 
from the survey. The results for each of the 19 States (having 12 replications or 
sub-samples each) are shown in Table 12. The total degrees of freedom (11) has 
been split up as follows—between agency (1), between parties within agency (2), 
and error (8). 

70. It will be noticed that there was significant difference between parties 
within agency for the States of Uttar Pradesh, Assam and Mysore. Agency differences 
were significant for Uttar Pradesh and Jammu & Kashmir. 
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VI. SAMPLE SURVEY FOR ESTIMATION OF RATE OF GROWTH OF POPULATION 
1958-59, 1959-60 


71. It is not an uncommon practice in the NSS to have a built-in system of 
checking in the survey design. There is in the first instance the system of completely 
independent field work as well as processing—one by the Central agency and the 
other by State agencies. This interpenetrating arrangement provides two valid 
but independent estimates of all characters under study. 

72. There are other systems which provide comparisons, the elements of 
which are all provided by the same agency. We shall illustrate one such method 
used in our current plan in regard to the estimation of rate of growth of population 
which is obviously of fundamental importance in any planning for national develop- 
ment. The survey, conducted by the interview method, is restricted to the rural 
areas in the first instance. Our vital registration system being rather unreliable 
we had to explore the possibilities of sample survey methods. 

73. Control of sampling and other errors is sought to be achieved in two 
ways. First, control of sampling error by having a fairly large number of sample 
villages and canvassing necessary information from each and every household m 
them. “It is also believed that the prima facie acceptability or non-acceptability of 
vital rates for the sample villages obtainable from their complete coverage has helped 
to reduce and control errors of omission ete. Again complete coverage has helped 
cross-checking of dates of births and deaths in the neighbouring households by study- 
ing the time sequence and the interval between those events; the deaths to single 
member households, no longer existing at the time of enquiry, are also obtainable 
only on the complete enumeration of villages by enquiring about such cases in, say, 
every 10th household. Also the village as a sample unit has definitely better technical 
and practical advantages over a sample of households in re-enumerating the popula- 
Чоп, narrated later. Secondly, by making the reference period sufficiently long 
and sampling fluctuations are reduced further. This step naturally increased the 
magnitude of ascertainment biases due mainly to recall lapse. А short reference 
period would increase the sampling error itself beyond desirable limits as the resources 
at our disposal did not permit us to increase the number of sample villages. More- 

| over, there were reasons like relatively larger border bias if too short a reference period 
was chosen, i " 

74, By border bias we mean undue (net) inclusion or exclusion of vital events 
actually occurring at points of time around the end-points of the reference period. 
In the NSS the normal practice is to have as reference a period of specified length 


2219 It is the general practice in NSS to introduce in its earlier rounds some studies of a ‘pilot’ nature 


before taking up the subject in question in a full-fledged manner. These studies are not small scale ad-hoc 
ones, but are extensive surveys so that the effect of large-scale operational conditions may be adequately 
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immediately preceding the date of interview, and such border. bias is likely to -be 
more serious near the remoter end-point. å 


75. It has been planned to collect data with a 2-year reference period and 
have such additional information on the time of actual occurrence that one can work 
with any shorter reference period at the tabulation stage. Tt is felt that although 
there would still be border errors for any smaller reference period yet it will mainly be 
of a random nature with the result that the net effect (bias) would be smaller than 
what it would beif thereference period of collection was identical with reference period 
of analysis. Conscious efforts made by the investigator and/or informant for the 
correct placement of a vital event in relation to the (remoter) cut-off point is believed 
to result in larger biases than what would result from a cut-off at the tabulation 
stage. It should be pointed out here that the rural population has usually no precise 
knowledge of the exact date of occurrence of a birth or death. 


76. After establishing all these precautions it is necessary to have a self- 
evaluating system. In this connection we shall take up several points one by one. 
NSS investigators һауе acquired some experience in collection of birth and death 
data in earlier rounds, But it is not*known whether they have reached a ‘steady’ 
state so that there is no further learning-effect. Evaluation in this regard is neces- 
sary. Ifthe learning-effect is still found to be present then caution should be exercised 
in interpreting the results. For this and other purposes the NSS investigation is 
spread uniformly over one-year round. To be more specific the entire round has 
been broken up into six two-month’ sub-rounds and data have been separately tabu- 
lated for each sub-round, Sub-round comparisons will be of help in studying the 


learning-effect. 


TT. Tt is considered desirable for the sake of accuracy to enumerate each and 
every individual in all the households in the sample villages. This will help us to 
have not only a correct picture of the number in different age-sex-marital status 
classes but would also help us to record more correctly all births and deaths occurring 
to a member of the household canvassed. But as a multi-purpose survey organisa- 
tion the NSS has to explore the effect of adopting 8 less time-consuming plan so 
that resources for other enquiries may be found. It has therefore been decided that 
in the 14th round only the first two sub-rounds would have a detailed individual by 
individual enumeration, whereas in the later sub-rounds summary information on a 
household basis would be collected. A comparison of the results thrown up by the 
two methods of varying intensity will therefore be of value in assessing the results 


thrown up by the less intensive enquiry. 


78, There is also the basic problem of evaluation of the very approach of 
estimating birth and death rates as described earlier. A second more direct approach, 
particularly in regard to deaths, may be helpful in studying the efficacy of this 


approach, In the second approach an account has been taken, of the whereabouts, 


in the fifth and sixth sub-rounds of the 14th round, of each and every individual 
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enumerated in the first two sub-rounds," and this resulted in recording deaths if the 
person in question had died. Cases of migration, both in and out, have been also 
noted. Additions due to births and cases of omissions in either sub-rounds are also 
recorded. It may be pointed out that babies born immediately before the date of 
interview, even if omitted in the first two sub-rounds, (which may not be a very 
unlikely event), are less likely to be omitted at the re-enumeration stage when they 
have somewhat grown up (provided they have not died in the intervening period). 


79. At the re-enumeration stage an attempt has also been made to collect 
data about births and deaths during the reference period. Tt will be seen from what 
has been said about that the second approach (in which, strictly speaking, the 
first approach is a part) is expeoted to be less liable to omission of deaths and 
to some extent of births as well. But both of them are likely to suffer from 
under-enumeration of cases of births and deaths of very short-lived infants. However, 
it may be pointed out that as our principal objective is to determine the rate of 
increase such cases would not affect (excepting in a marginal way) the results. 


80. It must be emphasised, however, that exploring the possibility of 
estimation of birth and death rates fairly accurately has been also kept in view. It 
therefore becomes necessary to provide for a method of assessing the magnitude of 
the recall biases. Tabulation of data by varying lengths of recall periods will throw 
some light, but in addition a more direct approach appears desirable. 


81. The evaluation plan takes into account not a single annual round but 
two consecutive rounds (the 14th and 15th rounds of NSS). According to this plan 
the same set of villages are being covered in these rounds. It will be recalled that 
our reference period is ‘last two-years’ so that, what is ‘last year’ in the earlier round 
becomes ‘year before last’ for the next round. We shall therefore have two sets 
of data for the same reference period, but with different recall periods. The 15th 
round is currently going on and it is hoped that the two sets will provide a means 
for evaluation of the accuracy of such data collected by the interview method. It 
will also provide a method of adjustment of data subject to recall lapse. 

82. We shall now present some of the preliminary results obtained so far. 
Table 13 sets down the estimates of (1) average household size, (2) sex-ratio, (3) per- 
centages of the population falling in the three age groups : 0-14, 15-44. 45-, (4) birth- 
rate, (5) death-rate and (6) rate of natural increase for each of the six sub-rounds 
of the 14th round (the first round for the 2-round enquiry) There are two parties 
of investigators; in every stratum one set of six villages is to be covered by an investi- 
gator belonging to the first party, and a second set of six by an investigator belonging 
to the second party. It is thus possible to obtain two valid and independent sets 


ТЕ 11 To ensure careful re-enumeration work, certain fictitious names have been entered in the origianl 
listing. And в new item of information viz., ‘days sick during last month’ has to be entered (in the fifth 


354 


EVALUATION OF STATISTICAL SURVEYS 


of estimates. These are also shown, The reference period is ‘last year’ for births 
and deaths. : 


83. It appears on examination of all possible long versus short schedule com- 
parisons that the shorter schedule gives rise to larger estimates of average household 
size. In regard to sex-ratio, there is no obvious difference. For the age groups, 
particularly the two older groups 15-44 and 45-, there is clear evidence of difference 
in the two schedule types. Under-estimation of the older groups and over-estima- 
tion of the middle group in the short schedule is obvious. For the younger group, 
0-14, within party (or within sub-sample) comparisons show that the short schedule 
gives underestimates; 14 out of 16 possible comparisons support this, 


TABLE 13. COMPARISON OVER SUB-ROUNDS, SCHEDULE TYPES AND SUB-SAMPLES OF 
CERTAIN DEMOGRAPHIC VARIABLES; POPULATION ENQUIRY, 1958-59 
Rural sector: All-India 
sample size : 218 villages/sub-sample/sub-round 
2616 villages during entire round 


— ——————— 


sub- schedule sub- average sex percentage in age-group rates per 1000 persons 
round type sample household ratios - - 
or party size 0-14 15-44 45- birth death growth 
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) 
1 long 1 5.07 104.12 40.70 42.84 16.46 38.37 18.98 19.39 
2 5.01 103.97 40.54 42.79 16.67 38.56 19.12 19.44 
pooled 5,04 104.05 40.62 42,82 16.56 98.47 19.05 19,42 
2 | long 1 4.97 101.50 40.00 42.65 16.45 38.76 19.50 19.26 
2 5.11 103.86 41.03 42,77 16.20 40.45 10.89 20.56 
pooled 5.04 102.74 40.97 42.72 16.31 39.66 19.70 19.96 
3 short 1 5.16 102.61 40.45 45.16 14.39 37.49 19.82 17.07 
2 5.15 103.84 40.13 45.36 14.51 35.00 18.81 16.19 
pooled 5.16 103.17 40.31 45.25 14.44 36.36 19.36 17.00 
4 short 1 5.21 104.94 40.65 45.36 13.99 41.03 21.57 20,06 
2 5.21 105.47 40.83 45.01 14.16 38.65 19,08 19.57 
pooled 5.21 105.20 40.74 45.19 14.07 40.13 20.32 19.81 
5' short 1 5.14 10918 40.62 44.00 14.78 36.38 20.04 16.20 
2 5.15 103.59 40.33 44.70 14.97 37.33 17.24 20.09 
pooled 5.15 103.37 40.46 44.06 14.88 36.85 18.60 18.26 
6 short 1 5.28 103.60 40.88 45.13 14.04 38,13 16.28 21.85 
2 5.16 103.61 40.29 45.18 14.53 37.65 17.22 20.43 
pooled 5.22 103.61 40:53 45.16 14.31 37.87 16.79 21.08 
ИИИ Е Е BUR SPESE IS "RD arra 
long 1 5.4 103.30 40.68 44.31 15.01 88.50 19.47 19.09 
1-6 ала 2 5.13 104.07 40.54 44.28 15.18 38.02 18.58 19.44 


short pooled 5.14 103.69 40.61 44.29 15.10 38.26 19.02 19.24 


а Males рег 100 females. 


84, Similar analysis of birth and death rates does not reveal any marked 
difference between the two schedule-types excepting possibly for birth-rate which 
is higher (13 out of 16 possible comparisons) for the long schedule. 
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85. It is also necessary to study the learning or familiarity effect. The 
difference, if appreciable, will be revealed by an examination of the differences between 
sub-rounds other conditions remaining the same. We shall study this in respect 
of birth rates. Concentrating on the first two sub-rounds we note that both the 
parties have registered a higher rate at the second sub-round showing perhaps some 
improvement in enumeration of births. To secure a little more confirmation we 
have examined this in relation to the five (administrative) Zones into which the 
country has been divided; 8 out of 10 possible within-party comparisons support this. 
It will be recalled that from the third sub-round onwards the short schedule has been 
used. The fourth sub-round again shows a higher birth-rate compared to the third 
not only for all India but for 9 out of 10 possible within-party zonal comparisons. 
In case of death-rate, however, there is no such phenomenon. It would appear 
therefore that there is a learning-effect in regard to births but nothing very much 
in respect of deaths. 

86. The existence of learning-effect brings in the question whether the 
difference in the schedule’types could not be explained by the learning-effect. We 
would be inclined to feel that the nature amd magnitude of the differences are such 
that it cannot be explained in that way. 

87. Preliminary analysis of the re-enumeration data collected in the fifth 

Él sixth sub-rounds show that the annual (the interval between enumeration and 
reenumeration was roughly 8 months!? (number of deaths etc., has been proportion- 
ately increased to 12 months) death rate among persons enumerated earlier comes 

out 1.71%. This may be seen against a death rate of 1.77% obtained by the previous 

- approach for the part of the enquiry conducted during the same survey-period (fifth 

222 and sixth sub-rounds). It will be noted that inclusion of infants born after the first 
% enumeration and the deaths among them (before re-enumeration) would have made 4 

the two figures comparable. As far as one can make out from our estimate of birth 

tate viz., 3,82%, and the infant death proportion of 99 per 1000 live births (propor- 

_ tion of infants born alive and dying in the reference year) estimated from this enquiry, 

the comparable death-rate comes to about 1.95%. It would appear therefore that 
ex the method of accounting in re-enumeration has resulted in a more complete enumera- . 
К tion of deaths. Further probing analysis is necessary before one can be оп surer 
_ grounds regarding this evaluation of the results. 


88. It is necessary to examine the magnitude of recall-bias. A direct evalua- 
tion of this bias will be possible, as explained earlier when the 15th round data (yet 
to be collected and analysed) are seen against the 14th round data for the same 
villages and households. We can however study the effect in a less direct way from 

the 14th round data. It is found that at the all-India level, the ‘year before last’ 
. birth rate was 81.7% of the ‘last year’ birth rate. In regard to death rate the corres- 
ponding percentage was substantially lower, viz., 52.6%. While the effect of the 


e e eight month gap is not entirely satisfactory, one year would be preferable, and tho data 
currently being obtained in the 15th round will remove this defect. 
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recalllapse is quite appreciable in case of the above two rates it is interesting to 
estimate the effect of this on the growth-rate, because there is a certain amount of 
compensation as both are under-estimates and the one (death rate) subject to higher 
(95) recall-lapse has a lower absolute value. Тһе net effect is over-estimation of 
growth rate by 10.5% if ‘year before last’ replaces ‘last year’. і 


89. Тһе above however is а partial appraisal of the situation because the 
question of recall-lapse within the course of the year has been overlooked. One 
can, however, reasonably expect from what has been said above that within year 
recall-lapse effect, on the growth-rate cannot be very large. An actual study on the 
Tth round data showed that a slight increase of about 2 per cent in the growth rate 
may perhaps be allowed for recall-lapse within the course of the one-year reference 
period. ; 


90.. It would appear therefore that the 14th round NSS estimates of growth- 
rate is perhaps subject to a slight downward bias. It is however not quite clear at 
this stage about the effect of a 2-year reference period, introduced for the first time * 
in the 14th round, and how far the conditions obtaining in the 7th round are obtain- 
able in the 14th round. Deeper analysis of 14th and 15th round material may help 


in clearing up the issues. 
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RESUME 
En développant les techniques pour l'évaluation des résultats d'inspection, l'on considére le fait 
suivant avec une importance qu'il mérite. Le résultat final dépend non seulement de la “vraio” valeur 


de l'unité pour l'observation, mais aussi de la móthode d'obtenir de telles renseignements, unitó par unitó, 


et aussi de la móthode d'arranger des données. 


Le résultat póut même dépendre de la choix de l'unitó de certitude (‘‘ascertainment unit”), La 


mentale est donc l'établissement des comparaisons entre deux (ou un plus haut nombre) 


technique fonda: 
chniques, partiellement, 


estimations de la méme quantité obtenue sous les conditions opératoires et/ou te 
ou entièrement différentes. 


inspection contre les donnóes obtenues par une méthode qui est, à priori, plus 


L’on péut évaluer l'ins 
valable. ‘Unitary checks” qu'il est possible d’introduire quand les unités de certitude sont identiques, 


mettent en jour les individuelles erreurs de certitude. 


possible quand méme les unités de certitude sont différentes—donne T 
nous fournit un moyen de juger de l'effet, de 


La comparaison entre “totaux” óvaluós—ot c'est. 
erreur totale derivée de tous les 


sources. L'évaluation des unités choisies aléatoirement 
s erreurs ómanées des sources différents, ой il s'agit des estimations 


l'opportunité amoindrie de compenser le 
des unitós territoriales (plus petites). 


de ‘totaux’ ou quelque semblable sommaire de "renseignements" 
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Cotte évaluation est importante, car c'est possible que la petitesse de l'erreur totale quand les unités terri- 
toriales sont plus grosses, n'indique la méme chose pour les petites unités. L'on développe une technique 
pour séparer la certitude totale ou l'erreur d'énumeration, de l'erreur totale de “compilation-editing,” 
en évaluant ‘un "total" ou l'information sommaire semblable. 


Dans le cas d'échantillons entrepénétrants, quand l'on tire deux (ou un plus hant nombre) échan- 
tillons de la même population et si l'on les traite par le même dessein d'inspection, les resultats basés sur 
les différents échantillons sont également valables, —quand méme ils sont obtenus par différents recher- 


cheurs et/ou par les différentes unités d'arrangement. Les différentielles opératoires corréspondent à 
la divergence entre différentes estimations, De plus, l'estimation de l'étendue d incertitude cat basée 
sur cette divergence, . : 

Une technique de la “propre évaluation" (self-evaluation) parait aussi d'étre utile. Pour achever 
се but il faut accumuler les données sur les unités de certitude ayant différents "dimensions",—les unes 
étant les parties des sutres—par exemple, une petite coupe dans une plus grosse coupe (des expériences 
de faucher). Les données pour une longue période de référence, accumulées en une telle manière que les 
réductions en tables pour une plus courte période de référence sont aussi possible, donnent, en effet, des 
diffórents- dimensions de la période de référence, Une comparaison des estimations que l'on puisse obtenir 
des différents dimensions facilite l'évaluation. 

La comparaison d'éstimation, sur différents pointes ou périodes (de temps) est une autre méthode. 
Les comparaisons des estimations pour la méme période de référence (mais aux différentes périodes а’ 


inspection) sont aussi avantageuses. Faire des recherches sur Гассога dans les propriétés de la divergence 
entre différentes sections territoriales, en vaut la peine. 


Toute les techniques discutées sont nanties des résultats obtenus dans les domains variés— 
moisson, bétail, propriété et population, 
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ON CONFIDENCE INTERVAL FOR TWO-MEANS PROBLEM 
BASED ON SEPARATE ESTIMATES OF VARIANCES 
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By SAIBAL BANERJEE 
Indian Statistical Institute 


SUMMARY. Given k samples of n; units from k normal populations N;(m;, c) (і--1,2,.., k) 
k 

^ confidence interval for any linear function 2 орт; (where с, are known coefficients) with confi- 
1 

dence coefficient not less than some pre-assigned probability e is possible in terms of sample estimates 


of population means and variances and tabulated values of Student’s ¢-table. Some generalizations of 
the result including testing of hypothesis have heen considered, 


1. INTRODUCTION DE 
Given two samples of n; (¢ = 1, 2) units from two normal populations with 
means m, and m, and variances с? and o$ а confidence interval for any linear function 
CM HCM (where. c, and c, are known coefficients) in terms of sample estimates of 
population means and-variances is of interest. It was shown by the author (Banerjee, 
1960) that given k-samples-of n; units from k normal populations Nim oi = І 
S < 


: k SE RNC 
..., В) and some pre-assigned probability а, a confidence interval (ор Ұ сұт, with confi- Жа 
> 


dence coefficient not less than pre-assigned probability о, could be built up from ‘the 
relation ч 


LM 
VA 


® 2 ` z 
prob | (som Vcd mw |> Do ue, D 


è Ab 
where 2, and s? (i = 1, 2, ..., k) are sample estimates of population means and or So 
variances, c; (# = 1, 2, ..., k) are known coefficients and t; ($ = 1, 2, ..., k) are so chosen. gt 
that ae fin 

n+l 


ti ==“ 


и | (1+2) 4 dt=a (y =m]; i= ТЕТЕ 


The probability statement (1.1) or the confidence interval associated with the proba- 
bility statement (1.1) is applicable to more general situations. Some generalizations 
are considered in this paper. : 

2. NOTATIONS 


The following notations have been used throughout this paper: 


f(ad) denotes the frequency function of a Xi variate which is distributed as. 


а X? variate with у; degress of freedom. 
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: h(x?) denotes the frequency function of a x? variate which is distributed as a 
X? variate with one degree of freedom. 
Да») denotes the frequency. function. ofi а. f-variate which is distributed аз 
Student’s ¢-variate with v; degrees of freedom. | 


The terminology that f; (| — 1, 2, ..., k) аге t-values of Student's t-table of 
У (6 = 1, 2; ..., b)-d.f. Е зад to бааа coefficient œ denotes t-valucs of 
Student’s t-table во that 


қ 
J ма = a. 
RU 


3. THEOREMS 


зз eus Wo 1: af u bea Mond variate Вешала about zero mean and unit 
variance and x} (i = 1, 2, ..., k) be x? variates distributed mutually independently and 
also independently of u with v; (i = 1, 2, ..., k) degrees of freedom and w; (i = 1, 2, ..., k) 
be а set of arbitrary weights satisfying the relation 


nes ^ 8 РУСУ 


5 Ж 
b [и < vx) 2c O6. (39 


where t; (ф = 1, 2, ..., k) are tabulated values of Student's t-table of v, (i = 1, 2, .... k) 
degrees of freedom corresponding to confidence coefficient а. 
Proof: We have 


З 42 со oo © с жастың 
` prob мі ў й wat |= ff [Trae { ае) аб. а. 22 (3.2) 
Бс зу ы; 90 0 о 


where 0 popnlo 
1 


z 


As [1038 i is an Ee Y convex function of Z: 


$a fa 
кә # > Eu; | қама. E ur 
! oe 
LEV 
к а-а у 52-І : A ^ 
It сап be shown [76% {f қазу") dira; (4-22, 2 (8.4) 
9 К л i 4 
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k ` 
Аз. È ш = 1, from (3.2), (83) and (34, 0000002 


Бей fe < EA ии] > a 28.5) 


Theorem 2: Let Y » а normal variate distributed about mean M with variance 


“Ме 


X Ayo + 2094, where A, and 2 im A 


2;..., 1) are known positive 


constants. If o$ (j =1, 2,. D * kon und. d be eatiinates of о? (i = 1, 2,..., №) 
where v,82/o% (4 = 1, 2, . 29 ате "шо, independently (and also independently of Y) 
distributed as x? with У; te = 1, 2,...,h) degrees ‘of freedom, then 

* 


prob ГІТ Е) 
ЫА i 


where t; (i = 1, 2, ..., k) and d are respectively _ tabulated values of Student's table 
of 6-1,2. E) d. f. and the normal probability table corresponding to confidence 
coefficient a. | : f be Pins 


Proof: We have prob [ace < Edu ye bori | = с ы A А 4 
aie aes ae To 4 де 
[I^ | d (мона. qi oq 
0 0 3 б, 


where, |. | 1427777 бт) 2” 
M 4 
зе p uL 2,1,1) 


pon 


SUAM aba: 
w= ®лүо{-Е®бу©;) @ 1, 2,...› k) 


6,03 


en Ба G=1, 2,550) 


ЕР. с ‚БАРУ, 
R= D w + Хм, 
1% f= a 


and f(x?) denotes the frequency function of a д? variate which is distributed as a x? 
variate with у; degrees of freedom and- Mx 1) siotiotes the frequency function of a 


X? variate with one degree of freedom; s 
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As f A(x?)dx is an upward convex function of z, 
0 


t : 2 


Е ая а 
k »i 1 
ШУ; > Хщ | Moe Ev; [ Мл)”. ... (3.8) 
0 0 0 
oo 4 xt 
It can be shown that f FA { | дал D РВ 
Ё F > 
ала Г Қуа = a. ... (3.9) 
9 


1 
As Хш + Ушу = 1, бош (37) (3.8) and (3.9) 


prob [(У—2/) < Este Ў 004] >а. ... (3.10) 
1 


Result 1; Let three samples of У,, №, апа N, units be drawn from three 
normal populations with means Mı, тз an'd mg and variances not necessarily equal. Let 
(2%, 4) (i = 1, 2, 3) be respectively estimates for the population means and variances 
of the three populations. Suppose, based on previous experiments, we have estim 
(i) By, sÅ ; (ii) 2, and (iii) så respectively for (i) mean and variance of the first popu- 
lation, (ii) mean of the second. population and (ІН) variance of the third population 
based on n, n, and ng units. Then a confidence interval for any linear function 


tes 


3 ы : 
Хот of population 4means, with confidence coefficient not less than some pro- 


assigned probability а, may be built up from the relation : 


ғ 
prob T саит) < fio ]>e E. (3.11) 
x 1 No 
where Ny = Noa; z= Vathë 
> 10 


Na = ®,+т; Tog н 


20 
Nay = NS Fp = 23 
ғ sa = MD + (1) 
Nbn-2 
42 = 8$ 
а = (Wa—1)s§ + (ns—1) 8 
N,+n3—2 
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and f; (i = 1, 2, 3) are tabulated values of Student's t-table of у; d.f. (v, = N42-1,—2, 
vg = М, —1 and v, = N,4+n,—2) corresponding to confidence coefficient c. 
Relation (3.11) follows directly from Theorem 2. 


Result 2: From two samples (8955 $—12; 7 =1,2,...,0) of т 


(i = 1, 2) units from two bivariate normal populations regression equations of y on 
x are estimated as: 

Population I : Y,— 9, +b,(X,—2,) 

Population П: Ү,-2,4%Х,-2,). 


If f, and fj, denote respectively population values of regression coefficients of the 
two populations a confidence interval for /,—/, may be built up from the result : 


prob [0-2-0 < ы += a | >a 1 (8719) 
where 8 = X. (ау-2)) (4-1,2) 
1 
> (У- Y, Pu 
84 2-502 1/2. é 


= +8); (4-1,2) 
and & ($ = 1, 2) are mom values of Student's t-table of 7 m> 2 ü= — ee 2) d.f. corres- 
ponding to confidence coefficient a. 


As b, and b, ere! are distributed normally i means £, and ба with 
variances say $ and © E 1 ond s? and 4% are estimates of с? and c2 with n,—2 and 
—2 d.f., the result follows from Theorem 2, E 


4. SOME RESULTS ON THE TWO-MEANS PROBLEM 


In this section four results on the two-means problem would be presented 
as under : 

(1) proof that restricting to class of functions of the form A,s?+A,s? the only 
function which with minimum values of A, and A, would satisfy the relation 


{ i о(2;—т)) < А-А, ? 


with probability. not less that any pre-assigned probability « is 


did, шы 
т т 
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so. (2) numerical values of upper bounds of probability of the inequality 


< dd ү deb 
{2 а m;) } < th FE Ng 


- (3) proof that the confidence interval . 
3 6%, E МЫ fin, is unbiassed 
1 1 


(4) non-central confidence intervals. 
Throughout this section and the following sections, unless otherwise stated, 
'g, and s? denote sample estimates of population mean m, and variance o? based on а 
sample of size n, drawn from a normal population N(m;, сЗ). Also 2, and sj denote 
‘sample estimates of population mean Ma and variance 0% based оп a sample of size 
п, drawn from a normal population Х(т,,0%). Also у= m—1) and ~ (= Ny—1) 
denote degrees of freedom of s? and 52. Also с, and с, denote known constants, positive 
or negative, and a a pre-assigned probability level (between 0 and 1) and ti = 1, 2) 


are tabulated values of Student’s t-table of у; (i = 1, 2) df. corresponding to con- | E 


Ў fidence coefficient а. 
Result 1: Let P{A,, Аз) denote the probability of the inequality 


87 E. (È eim} < 4,81--4,8. id (44) i 


5 Я ue E А 
Tt can be easily shown that P (4,, A,} is equal to 


азои ao(1—w1)y/2 


0101 $T -u-n as e ; 
ACC EI ET | p Е @ 7 чы” : { f CLR ] dydy, 09) 
КЕ 


where | а. 2% =1,2 
A ; чй 2 ja ein, (i J ) 


Е я оњ 4-88 ты j 
Further, it can Бө shown that P{A,, 4,) as defined in (4.2) is continuous in w, for 
ү 0xw «1l E 
f Let numerical values of A, and 4, be so taken that 
(226 


4i < = : TEE 9 
л ... (4.3) 


and - "e 
$ 4 4, 


CONFIDENCE INTERVAL FOR TWO-MEANS PROBLEM : 
With numerical values of A, and A, so determined by (4.3) as w, tends to unity 
P(A,, Aj) tends to ол where a! is defined as Е : 
Airis 
e An 
1 T Sie ape dmi. te: А 
Гру): Ta fe (91) (| eta td С ; 7 (4.4) 
which is less than a. 
Since P{A,, 4,} is continuous in w; it is possible to choose a value of w, near 
w, = 1 such that P{A;, Aa} < ж. This means that even if A, is made arbitrarily 
large, 7 


0° 


prob [{3 «mà < Aust det] 


э ы 
where А, < a depending upon the value of w,, would be less than the pre- 
1 


assigned probability level for some ofthe populations, Hence the only function of the 
form A,s?-+A,s3 which, with minimum values of A, and А», would satisfy the relation 


2 2 
(от) < Ait Asst 
у 


with probability not less than о; is; 
; вай | Жі. 


d р Ny СА 
2 2 4 
Result 9: From (4.2), P pm Ў а standing for the probability of the 
1 2 А 
inequality 
^p» аз, teisi 0558 
(bcne 
is equal to, 
оо. qup p itat natos 
1 1 1 Lp y 1—1 paml i bs 
CL a Tay e (у) (0з) €. 2-4 dz ТІ)? ... (4.5) 
Г(р1)” Г(рз) КЕ?) | | 9 2 x 


where pi [25 @= м; @= 1,2) 


202 
вто ud 

w; = - and w, = 1-01. 
Dx со та- со [па у 
pps aa is essentially a function of w, and it tends to a when w, tends to 

т Фт 

2 42,2 1 In 
em ER] is numerically 


zero or unity. For values of w, between zero and unity P { Mii 
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б ЕА ала СЕ в: eh 
greaterthan а. Differentiating 155 XE } with respect to w, within the sign of 
2 


integration (such differentiation is permissible for ур Ye 2 2 and for v, v, = 1 if 
0<€< ш, < 1—в) it can be shown that 


d р] йй udi р a 
етігі T; < (4.6) 
where for (i) aw, > aqu, 
=A. ЗЕЕ б 4 . бата 7 
1-К Гацо; өз. Р { hol, PT pili с } ... (4.7) 
= BaPa Уо, ‚ 6—©( А 
= К. 1-4-аи аз Р { hom Pitpil; — ps ) ... (4.8) 
and for (ii) AW, > ауш», 
LK. “А ,,-, ; 24-0, 
=k Taw, асы { T Pay Pi pil; — ee ) ... (4.9) 
I, = К. сараю "без: 4 1—6 
xem Taw, Go Í b Partl, patpat; 5 } ... (4.10) 
s 
where с, = S1 . e _ (уф 


Dra, ТЗ тұлан ; and К is a function of p;, а, and ш, ($ =1, 2). 


If у= у = у, so that 41 = а, = а from (4.7) and (4.8) for the case 
Wa > ақш (і.е. w, < 3) 


, 1 У 0—0, 
у= Gi ee Ууу, е у Тазы. a in (LIT 
В 1--аш {+ 2 У а) = 
1 = 
I, = K’. Da e Eq tau. noa 19 
г l--agw; ü 2 aks ит | e 


Ав Ff} азе фу, ура), @>0) 


from (4.11) and (4.12) it follows for w <4, 
"ESTA ... (413) 


E tic? dil. 

which means P um, A Increases as w, increases. Further from (4.11) and 
1 2 

(4.12) it is evident that аб w, =} 


ly. 2. (414) 
2 
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Also from (4,9) and (4.10) for the case agw, > aqw; (i.e. w, > 4) it can be shown that 


Ј ict 3 
I, is greater than I,. Р. (m, | for variation іп w, takes а maximum value at 


w, = 4. In Table 1 below maximum value of P (oe A fies } for suitable values of 
тот 
у (= n—1) and а (i) 0.90; (ii) 0.95 and (iii) 0.98 are presented. The values have been 


worked out from Table 9 (Probability Integral P(t/v) of thet-distribution) of Biometrika 
Tables, Volume I, using the relation : $ 


2 2 to 
P (5 ч jer Ж Қазу 


where f, satisfies the relation 
F 5 
Г ЛУ) = о. 
d 
Also, occasionally, Incomplete Beta Function tables were used. 
TABLE 1. MAXIMUM VALUES OF Р pe, s) 


FOR VARIATION IN vi 


eS SSS I сел сс кш ES ARAB 

1 .9758 .9939 .9990 

2 9567 .9873 9978 

3 9430 .9809 9961 

4 9342 .9759 9943 

5 9283 9721 9928 

6 9239 9691 9015 

8 9183 .9651 9895 

10 9149 .9623 9880 

12 9125 .9605 9869 

14 9108 .9591 9860 

16 9090 .9579 9854 

18 9080 .9572 9848 

0 9077 .9565 .9844 

22 .9062 .9559 .9840 

24 .9055 .9554 9836 

30. .9051 .9542 .9830 
RENI ааа uud у TOIT ASSETS 


Result 3: The confidence interval 
Зея; + A Hest 3E [T ... (415) 
74 7% 


is unbiassed, 
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Let - 

Ro pert = Р, = prob ШЕ о È ст, Vie È s ] 

| | (4.16) 
and P, = prob [( $«—ar } <È 444) | 


; 2 х 2 
Since for fixed (ву, 35), È с; is distributed normally about mean У с; with variance, 
1 1 


say 0%, 
тв " 
ЕЕ Ге} asa 
E | | 9(81)9(82) a ул 1,00; 
co о Tj 1 ими 
апа Pu | | 9(81) 9(82) { | wem е du } ds, ds, 
00 ЕТ] 
where 9(81) = frequency function of s, 
9(82) = ЗЧ, of s, 
(Um у= X Efi and И” — X eim,— M'. 
E 2; ТУ с 1 
Ав [oue [ ec MD du, 
E E 


unbiassedness follows.. 


- Result 4: The confidence interval so far considered for the two-means problem 
is central by nature. Non-central confidence interval with confidence coefficient not 
less than any pre-assigned probability level is, however, possible. Consider the 
confidence interval : 


{ ти, < È oim) K 7,4) PEU 


where T (82, 88) = ү toto? € tá cies 
0, Ng 


туа, = (St ІЗ 
n Ng 
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where t; ($ = 1, 2,; 7 = 1, 2) have been so determined that 


Јо = ans [шый = вы 


д HOM (4.18) 
ERA = 0; I Ау) = da; 
where iy > 0(,j = 1, 2) 
and ал09 = 0511-9529 
Now prob (-7,< Хот.) < Ty 
со со | Tal i 25 
= | | 95) 46) Те" а 
| | : з Ph Ул | 5554 
оо © ШІ тұт 
=+ [| ме | көже | Жі |4, o (4.19) 
оо 9 0 x 
where 
g(5,) = frequency function of s, 
2(8:) m3 m » of 82 
Ay) = 2 of x? variate with 1,4,4. 
and (p.d + dei 
л ТА 
| ту 5/04 Фо. 
Now | М ej Memeti) | Vg 022 (420) 
207%, à 
where їл = ag = ді a ‚ (ф= 1,9), 


From (4.19) and (4.20) 
ргоЬ[—Т,<7 eim) < Tal > Mar (4-233) (1-і (а-а) rd 


5. TEST OF HYPOTHESIS 


Тһе problem of the two-means has so far been considered from the approach 


of estimation by confidence interval corresponding to the region В 


2 = a вох 10288 
X e(z;— quur А. 
(аат «tos 
ary to the region of acceptance В, may be used 


The region Ё’, complement 
as a critical region for testing hypothesis Н regarding population means, For example, 
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if it is required to test the hypothesis Н that m, = m, the corresponding statistic 
T may be defined as 
2-2, 
= p 
“= м, 
a fis 


f, and t, being suitably chosen. If such tests are applied, the first kind of error, i.e., 
the probability of rejection of the hypothesis when true, will not be exactly equal to 
an assigned value 1--а but would depend upon шу or the variances of the two popu- 
lations. For w, tending to zero or unity, the error of the first kind would tend to the 
assigned value 1—0. Hence, one can go in for such tests, if one is prepared to accept 
tests whose first kind of error would not be exactly equal to a given value 1— but 
would depend upon the values of the free parameters (here oj, 03) in such a way that 
the first kind of error would always be less than or equal to 1—2. Such tests, however, 
would be unbiassed as the complementary region R, the region of acceptance, as 
proved earlier is unbiassed 


Two examples have been considered below. Example 1 has been taken from 
Biometrika Tables, Volume I, 1954 and Example 2 from Statistical Tables by 
Fisher and Yates, 


Example 1: Two samples of sizes nı = 10 and na = 15 furnish the follow- 
ing estimates : 


Е Met mca 
population mean variance 


I 73.4 51 


п 47.1 141 
—_— a 


То test the hypothesis about the equality of population means with maximum value 
of error of the first kind fixed at (i) 0.10, (ii) 0.05 and (iii) 0.02 three statistics (i) Т, 


(ii) Т, and (ii) T, respectively may be computed as under : 


134—411 — 263 


7i (= 1,9,3); T, = 734—471 = 3.16; 
d ] 4 “бола 833 
734—471 263: о 734—471 _ 963 

que 25 = 3.87: Тее ыста а = 2.56; 
: 4/16.20 6.80 10581 10.26 


ж 
where t;; are 100.0; percentile points of Student's t-table of v; d.f. with у = 9, уҙ = 14 
and o, = 0.10, оз = 0.05 and % = 0.02. All the statistics are numerically greater 
than unity. It is seen that with maximum value of the error of the first kind fixed 
even at 0.02 the hypothesis cannot be accepted. 
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Apart from the question of testing of hypothesis, confidence intervals I. v Ts 
and Г, for the difference of population means with confidence coefficient respectively 
not less than (i) 0.90, (ii) 0.95 and (iii) 0.98 may be built up as 

G) 1,->26.3 4- 6.80 (19.50-> 33.10) 
(ii) 1, 26.3 + 8.33 (17.97 34.63) 
(іі) 1,-» 26.3 + 10.26— (16.04-» 36.56) 


90 рег cent confidence limits, according to Welch’s solution for this case is 19.8 to 
32.8 which corresponds to interval J, having a range 19.5 to 33.1. 


Example 2: А physical constant evaluated by a new method gives a mean 
of twelve determinations, 


т, = 4.77383 


and that the sum of squares of the deviations of these values from their mean is 


У (£—2,)? = 0.11580 x 101 
d 


so that from 11 degrees of freedom the variance of the mean is estimated to be 
82021) = 0.8773 x 10-4 | 
and the estimated standard deviation of the mean is 
8(2,) = .9366 x 10-2, 
Numerous previous determinations, using different methods, have given the value, 
Zj = 4.744 
where 2, has a standard error based on a large number of degrees of freedom with 
в,(2,) = .00382. 
To test the hypothesis about the equality of population means with maximum 


value of error of the first kind fixed at (i) 0.10, (ii) 0.05 and (iii) 0.02 three statistics 
(i) T,, (ii) Т, and (iii) 7; respectively may be computed as under : 


gym :02083. 298 134 
[gem eoo т, = — = 2281 21.6; 
IET lt) Hla; С 10/8105 
т. 05068 2 2983 140; т 02803 — 2988 рд, 
д 10-2. 4/3.2246 180 10-2х 4/7-2704 2.70 4 


dent’s t-table of у; d.f. with v, = 11 and 
0-02, All the statistics are numerically 
ximum value of the error of the first 


where t; are 100-0; percentile points of Stu 
vy = œ and a = 0:10, аҙ = 0:05 and аз = 
greater than unity. It is seen that with ma: nv 
kind fixed even at 0.02 the hypothesis cannot be accepted. 
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6. BEHRENS-FISHER AND WELCH'S SOLUTION 


Fisher (1935) indicated that by means of fiducial argument in statistical in- 
ference given two samples of n;(i = 1,2) units from two normal populations with 
z; and s? (i = 1, 2) as sample estimates of population means and variances, the relation 


(3 —m3)— (94— m) 


d= a a = ш sin 0 —из cos 0 27 (6.1) 
m My 
where u; =" (i = 1, 2) and tan = si| v/na 
8 в ут, 


originally due to Behrens (1929), could be used to test the hypothesis that 4( = m,—m,.) 
has the value zero. Sukhatme in 1938 published critical values of Behrens-Fisher 
test as defined in (6.1) for 5 per cent level of significance for у, уз = 6, 8, 12, 24 and со 


for 0 = 0°, 15°, 30°, 45°, 60°, 75°, 90° where бап 0 is En Ут . Further critical values 
Saf V No 


for 1 per cent level of significance for эу, уз = 6, 8, 12, 24 and оо were published later. 
То calculate critical values of (6.1) Sukhatme assumed that for given value of 0, u, 
and u, were independently distributed as Student's t-variate with n;—1 (i = 1, 2.) d.f. 
Critical values of Behrens-Fisher test for small odd degrees of freedom у;, у, = 1, 3, 5 
and 7 were published by Fisher and Healy in 1956. 


Critical values of Behrens-Fisher test have been tabulated for different values 
ein 
5| A/ ig 
equal to é-values of Student's t-table with уҙ d.f. Also for 0 = 90°, critical values of 
Behrens-Fisher test is exactly equal to t-values of Student's t-table with у; 4... For 
intermediate values of 0, critical values of the Behrens-Fisher test for v, = v, = v 
and у = 6,8, 12, and 24 is numerically less than tabulated critical values for 0 = 0° 
“(ог 90°, which are numerically equal in such cases) for 5 per cent and 1 per cent level 
of significance. For v; 52 v, and vı, у = 6, 8, 12, 24 and co critical values of 
Behrens-Fisher test for intermediate values of 0 usually lies in between tabulated 
| critical values for 0 = 0° and 0 = 90° and is occasionally less than both of them. 
For уу = уз = 1 critical values of the test for intermediate values 0 are, however, 
numerically higher than corresponding critical values for 0 = 0°(or 90°) for 10 per cent, 
` Б per cent, 2 per cent and 1 per cent level of significance. 


of 0 where tan 0 = For 0 = 0°, critical values of Behrens-Fisher test is 


: Welch (1947) considered the problem of finding a function which is such that 


prob ee Vek, 4, аҚ» ы. (62) 
EA 
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for all values of Lom 03, ... оі, where y is a normal variate distributed about mean 7 
with variance S A05 and s? are independent estimates of o3, and А; are lun positive 


contenue t= 2 2,..,k) For у = т,—2, Welch’s problem is to find V(s?, s) 
which is such that 


2,--”%- 

prob [| сеа (82, “| =o. 2 (6.3) 
nubes 

Wilks (1940) stated that an exact solution of the form (6.3) is not possible; no proof 

has, however, been published. Welch has put forward a series for the case (6.2) (which 

includes the case 69, as well ) as under: 


Visi, = gp ЕЗ ag, — ВА (а) 


pu EAM ete. vee (64) 


в 
where с; = А8] Ў А3; 
1 


fi= ae. of 8; (0= 1,2,.::, №) 
and Ё is tabulated value of normal probability table so that 


к J e LR 
In the words of Bartlett (1956) “there is a permissible criticism of Welch’s solution, 
namely, that the existence of an exact solution in his sense has never been rigorously 
established." According to Wallace (1958) “it is still not known whether a non- 
randomized similar level æ test exists.” 

Critical values of Welch’s solution for the two sample case only have been 
calculated by Aspin and are given in Biometrika Tables, Volume I (Table Хо. 11, 1954 
print). The values given in Biometrika Tables cover the range (i) эу, уз = 6, 8, 10, 15, 
20 and со for æ = 0.90 and (ii) эу уг = 10, 12, 15, 20, 30 and оо for а = 0.98. Also 
further critical values for 4 = 0.95 and 0.99 are given in Welch, Trickett and James 
(1956). Critical values of Welch’s solution for very small degrees of freedom (уу, 


va < 5) have not been published. 


Critical values V(c, уі, v, 4) of Welch’s solution have been tabulated for given 


э, Уз and а for the ratio c = 7 ке EL This, of course, does not mean that critical 


values of Welch’s solution refer to sub-sets having observed variance ratios. 
Both Welch’s solution and the present solution refer to unrestricted variation of @,, 
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Za, sî and sj For с = 0 critical values of Welch’s solution are exactly equal to 1- 1 
values of the Students f-values with Уа degrees of freedom. Also for c — 1 critical 

values of Welch’s solution are exactly equal to t-values of the Student's t-values with d 
у degrees of freedom. For intermediate values of €, Ví(c, Уі, Уә, &) numerically lies in Я 
between V(0, у Уә а) and ҮІ, vy, уз, 2). ; $ 


. То compare the solution derived in this paper with Welch’s solution, the 
solution derived in this paper can be written as under : 


| (0—1)? HATHA 


ДО суу RAA | > * des 
Using relation (6.5), critical values of (i) Welch’s solution and (ii) the present к 
solution for the cases (i) а = 0-95; у, = 8; У = 8, 12 and oo and (ii) ж = 0-99; Уа = 19; T | 
¥ = 12 and оо have been given in Tables 2 and 3. Critieal values of the present 5 
solution as given in column (4) of Tables 2 and 3 have been worked out from the Ў 
relation: 


Мїйс+ 1—с) 
қ 


where c= А8? 


Asst - Asi 


for different values of c as indicated in column (1) of Table 2, 


values, properly weighted by the 
frequency of occurrence of ¢, would, however, be smaller than the maximum magnitude 


„ 
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TABLE 2. CRITICAL VALUES OF (i) WELCH (ii) BEHRENS-FISHER AND 
(ii) PRESENT SOLUTION FOR », = 8; м =8, 12 AND о 
FOR 5 PER CENT LEVEL OF SIGNIFICANCE 


с Welch's Welirins-Fisher’ + i i 
E sl present difference difference 
: solution test* w, solution col. (4)-со1. (2) cols (4)-col. (3) 
52 (1) (2) (8) ШУ: М (б) (6) 
9-8; n =8 
0.0 2.31 ; 2.81 2.31 0.00 0.00 
od 2.95 ( 2.30 2.81 0.06 0.01 
0.2 240 Ss, 2.30 2.31 0.11 0.01 
0.3 24575) 2.29 2.31 0.17 0.02 
0.4 2.10 2.29 2.31 0.21 ^ 0.02 
0.6 2.08 2.29 2.91 0.93 0.02 
0.6 2.10 2.29 2.31 0.21 0.02 
0.7 2.14 2.29 2.31 0.17 0.02 
0.8 2.20 2.30 2.31 0.11 0.01 
0.9 2.25 2,30. 2.31 0.06 0.01 
Em 98 2.31 2.31 9.81 0.00 0.00 
B 4 ‚э = 8; м =12 
? t 
ee eee есте кант д а ы ерт 
0.0 2.31 2.31 2.31 0.00 0.00 
0.1 9.95 2.29 2.29 0.04 0.00 
0.2 2.20 2.27 2.28 0.08 0.01 
0.3 2,15 2.26 2.97 0.12 0.01 
0.4 2.10 2.24 2.26 0.16 0.02 
0.5 2.07 2.22 2.24 e 0.37 0.01 
0.6 2.07 2.22 2.23 0.16 0.01 
0.7 2.08 2.21 2.22 0.14 0.01 
0.8 2.1 2.20 2.20 0.09 0.00 
0.9 2.14 2.19 2.19 0.05 0.00 
1.0 2.18 2.18 2.18 0.00 0.00 


mg = 8; HO 


0.0 = 2.31 2.31 2.31 0. 0.00 
0.1 2.255 2.27 2.27 0.02 0.00 
0.2 2.20 2.23 2.24 0.04 0.01 
0.3 2.14 2.20 2.21 0.07 0.01 
0.4 2.09 = 2.16 2.17 0.08 · 0.01 
0.5 2.05 2.13 2.14 0.09 0.01 
0.6 2.01 2.09 2.10 0.09 5570101 
0.7 1,99- 2.06 2.07 «. 0.08... 220.01 
0.8 1.97 2.08 2.03 г 0.00. -- 220.00 
0.9 1.96 1.99. 2.00 0.04 2-00 
102 1.96 1.96 1.96 “0.00 0.00 


ж Values have beer worked out from tabulated values as given in Table ҮТ, Statistical Tables by Fisher 
and Yates, by interpolation. ` " 


375 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : Sers А 


TABLE 3. CRITICAL VALUES OF (i) WELCH (ii) BEHRENS-FISHER AND 
^ "(dii PRESENT SOLUTION FOR r, = 12; м = 12 AND oc 
FOR 1 PER CENT LEVEL OF SIGNIFICANCE 


(6 | Welch's Behrens-Fisher present difference difference 
solution test* solution col. (4)-col, (2) сої, (4)-col. (3) 
(9. (2) (8) (4) (5) (8) 


0.0 3.05 3.06 3.06 0.01 0.00 
0.1 2.98 3.02 3.06 0.08 0.04 
0.2 2.91 2.99 3.06 0.15 0.07 
0.3 2.84 2.97 3.06 0.22 0.09 
0.4 2.78 2.96 3.06 0.28 0.10 
0.5 2.76 2.95 3.06 0.30 0.11 
0.6 2.78 2.96 3.06 0.28 0.10 
0.7 2.84 2.97 3.06 0.22 0.09 
0.8 2.91 2.99 3.06 0.15 0.07 
0.9 2.98 3.02 3.06 0.08 0.04 


0.0 3.05 3.06 3.06 0.01 0.00 
0.1 2.98 3.00 3.01 0.03 0.01 
0.2 2.91 2.94 2.96 0.05 0.02 
0.3 2.84 2.88 2.92 0.08 0.04 
0.4 2.77 2.83 2.87 0.10 0.04 
0.5 2.71 _ 2.78 2.83 0.12 0.05 
0.6 2.65 2.73 9.7 0.13 0.05 
0.7 2.62 2.68 2.78 0.11 0.05 
0.8 2.59 2.64 2.68 0.09 0.04 
0.9 - 2.68 2.61 2.63 9.05 0.02 
1.0 2.58 2.58 2.58 0.00 0.00 


* Values һауе been worked out from tabulated values as given in Table VI, Statistical Tables by Fisher 
and Yates, by interpolation. j 


у А s/m ; 
having observed! values POY a a "heroas in the present case critical values refer to 
al V 7% 


unrestricted variation of the four sample estimates %,, Z, 8? and 88. 


Bearing in mind the broad limitations of comparing the critical values of the 
present solution with the critical values of Behrens-Fisher test, in column (6) of Tables 
2 and 8 differences in critical values of the two solutions have been shown. It is seen 
that the differences in the critical values are small. 


_ Further comparison of critical values of the present solution with the critical 
values of the Behrens-Fisher test for small 4.4. Уі» Уз = 1,3 and 5 has been done іп” 


` 
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Table 4. [Welch's solution has not been considered because critical values of Welch’s 
solution for уу, уз < 5 have not been published.] It is seen from columns (4) and (7) 
that excepting the eases (i) эу = v, = 1; (Н)у, = l; v, = 3 differences in the critical 
values are usually small. 


TABLE 4. CRITICAL VALUES OF (i) BEHRENS-FISHER TEST AND 
(ii) PRESENT SOLUTION FOR SMALL ODD DEGREES OF 
FREEDOM FOR 5 PER CENT LEVEL OF SIGNIFICANCE 


9 Behrens- present difference Behrens- present БҮЗ mL. D 
Fishertest* ^ solution col.(3)-col. (2) Fisher test! solution | col.(6)-col. (5) 
(1) 43) (3) (4) (5) (6) (7) 
эи =1; и = 1 д n=l; n=3 
0° 12.706 12.706 0.000 3.182 3.182 .000 
15% 15.562 12.706 —2.856 4.960 4.501 --.459 
80% 17.857 12.706 --4.651 7.123 6.925 --:198 
45% 17.969 12.706 —5.263 9.303 9.262 —.041 
60* 17.357 12.706 —4.651 i 11.112 11.118 006 
75° 15.562 12.706 —2.856 12.294 12.300 .006 
90° 12.706 12.706 0.000 12.706 12.706 -000 
у= 1; уз = 5 ys =3; рз = 3 
0° 2.571 2.571 000 3.182 3.182 .000 
15° 4.218 4.121 --.097 3.191 3.182 --.009 
30° 6.636 6.732 096 3.225 3.182 —.043 
45^ 9.090 9.166 ove 3.244 3.182 --.062 
60% 11.043 11.077 034 3.225 3.182 —.043 
75% 12.282 12.292 010 3.191 3.182 —.009 
90° 12.706 12.706 000 3.182 3,182 .000 
»-3; уз =5 рр =5; Рз =5 
= 0% 2.571 .571 .000 2:571 2.571 .000 Р 
15° 2.626 2.617 —.009. 2.564 2.571 .007 с 
30° 2.756 2.797 —.019 2.562 | 2,571 -009 
е Му 2.897 2.893 —.004 2.565 2:571 :006 
60% 3.026 3.041 .015 2.562 2.572 .009 
75% 3.134 3.145 ou 2.564 2.571 .007 
90* 3.182 3..82 .000 9.571 2.571 ` .000 


* Values taken from Table VI, Statistical Tables (1957 edition) by Fisher and Yates. 
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27. CONCLUSION 


Given two samples of n; (; = 1, 2) units from two normal populations, having 


variances not necessarily equal, a confidence interval for any linear function of popu- 
- lation. means in terms of sample estimates of population means and variances and 

tabulated values of Student's i-tables is possible. If the population variances are 
ч unknown the only function of the form A,s?+A,s3 which with minimum values of A, 
апа A, would satisfy the relation 


[2 т) < А, --4,9 
1 


22 ; А 
with probability not less an a is 09181 809% — Further with maximum value of 


т өл 


“the error of the first kind (probability of rejection of hypothesis when true) fixed at 


any given value any hypothesis regarding the equality of population means (or any 
linear function of population means) of the two populations can be tested. Such 


— tests are unbiassed. 
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SOME LIMIT THEOREMS FOR JOINT. DISTRIBUTIONS 


By J. SETHURAMAN 


Indian Statistical Institute 


SUMMARY. In this paper we discuss the convergence of a sequence of joint distributions when 
it is known that the associated sequences of marginal and conditional distributions converge. Ilustreting 
by means of an example that the joint distributions need not converge weakly when the marginal ard.condi- 
tional distributions converge only weakly, several results are obtained by suitably strengthening the modes. А 
of convergence of the latter distributions, Ап illustrative application of these results is given. у 


1. INTRODUCTION 


When Z is the product space (X x Y), a random variable 6 оп (2, U) is of the 
form (č, 7) where Ё is a random variable on (X, S) and у on (Y, Т). Let (&) be a 
sequence of random variables оп Z and let A,(-), (+) and v, (x, +) denote the joint dis- 
tribution of (£,, 7,), the marginal distribution of £j, and the conditional distribution 
7, given Ё, = v, respectively. The theorems of the present work are concerned with 
the convergence of {A,} when {и„} and {y,} are known to converge in some sense. 
In an earlier paper this problem has been mentioned by Sukhatme and Sethuraman 
(1959). Theorem 2 was referred to and used earlier in another paper by the author ` 
(Sethuraman, 1961). 


2. NOTATIONS AND PRELIMINARIES 


Before embarking on the statement and the proofs of our theorems, we ex- 
plain in this section our notations and mention some well-known results which form 
the basic tools of this paper. т 


Throughout this work we will be concerned with two measure spaces (Х, 8) 
and (Y, T), their product (Z, U) = (X x Y, SX T) and a sequence of random variables 
(E, Mn) taking values іп Z. Тһе distribution of (Ё, 7,) on (Z, U) will be denoted by 
А, while the distribution of Е, on (X, S) will be denoted by и». We assume that the 
conditional distribution of 7, given Ё, = x exists аз-а probability measure, i.e. there 
exists a function v, (x, B) which is а probability . measure on T for each 26 Х and is & 
measurable function on X for each Вє7 and further satisfies the equation 


A,(4 xB) = J Yn (2, B)dp, for all Ав8 and BeT. 


In this case, if С is any set in U and C; for each zeX denotes the sub-set {у : yeY, 
(æ, y)eO}, then A,(C) = ул(2, С). (See Halmos, 1950). 
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Tf рі, рз, ..., is a sequence of probability measures defined on a measurable 
space (М, V), we shall say Pn converges strongly to p(p,—p in symbols) if р,(С)->р(С) 
for each CeV. It is well known that for a given sequence p, there exists a p such that 
Pa>p if and only if lim р,(С) exists for all CeV; in this case the p,'s are equiconti- 


nuous, i.e. if О, is any decreasing sequence of sets with ñ C, = 9. the null set, then 
| 
sup p,(C,)—0 as k — co. (See Halmos, 1950). It is also known that p, >p if 
n 


and only if | gdp,— gdp for all bounded measurable functions 0. (See Halmos, 
1950). If the densities f,(m) of p, with respect to some o-finite measure py, converge 
in measure [py] to a density function f(m) then there is a p such that р,-эр. (See 
Scheffe, 1947). The above is a sufficient condition for the strong convergence of a 
sequence of probability measures {p,} which is convenient in practice. 


We shall also require the notion of weak convergence of probability measures. 
. This requires that the basic space be topological, and that all continuous functions 
be measurable. If p, р», ... are probability measures on such a space М, we shall 
say p, converges to p weakly (р, => р in symbols) if f gdp,— | gdp for all 
bounded continuous functions g(m) on М. А set C is said to be a continuity set of p if 
plod C) = 0 where bd С is the boundary of C. Pn > p if and only if lim p,(C) = p(C) 
for each C that is a continuity set of p. (See Billingsley, 1956). Further, if M is 
separable complete metric, {p,} is compact under weak convergence if and only if, 
for each e > 0, there is a compact set OC M with p,(C) > 1—e for all n. (See 
Ртоһогоу, 1936 ; Varadarajan, 1958). 


For the formulation of one of our theorems we require the notion of UC* 
convergence (allied to that of Parzen (1954)) of a family of sequences of probability 
measures. Let у, (0,+), т = 0,1,... bea family of sequences of probability measures 
on M = R,, the euclidean space of Ё dimensions. It is assumed that the index 0 
takes values in a compact metric space Г. Let Én (t, 0) denote the characteristic 
function of у, (6, -), i.e. ` 


$n (6,0) = | exp (i ту, (0, dm), ъ= 0,1,... ВО 


ЖЕР Definition: у, (0, -) is said to converge to у, (0, +) in the (70% sense relative to 
Jeli 


(a) ОВ |9, (t, 0) —ф(1, 0)|->0 as n— оо 
е. 
aug $o(t, 0) is equicontinuous in Ө at t = 0 222.99) 


апа (b) Ф0, 0) is a continuous function of 6 for each t. ... (2.3) 
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We mention here that Parzen (1954) uses (2. 2) alone as the definition of UC* conver- 
gence. (2.2) and (2.3) imply that ф(0, 0) is continuous in ¢ and 0. (2.2) and (2:3) also 
imply the following : | 


б, 0, implies ф, (t, 0, ) фу, бу) ЖЗ 2040,4) 

which means Vn (05,7) == (бу, >); ae (2,5) 
6,6, implies dt, 6,) 9st, 9) (2:0) 

which means Yo(On, -) 5% уо(бо =). CA) 


3. Матч THEOREMS 


Before proceding to state and prove the first of our main results we establish 
the following lemma. 


Lemma 1: Let {p,} be a sequence of probability measures on an arbitrary 
measure space (M, V) converging strongly to a measure p. Let {up} be a sequence uni- 
formly bounded V-measurable functions converging almost everywhere [p] to а function 
u. Then i" dp, je 


Proof: — Let |u (m)| < A for all n. 


Define (OD, От: bm т-щт > 9. 
We know that the D, are decreasing and that if Й, D, = D then j(D) = 0. We һауе 
| f чаары fudpl = | (n tips бр f чар] 
; < [| шй Г биг dps] E | Гири Гар] 


<зАр(Бу+є+-| [udp,— [айр]... ва) 


The first term tends to zero since p, are equicontinuous and the last term 
tends to zero since р, Э p. Since в > 0 is arbitrary, the lemma is proved. 


Suppose that, in some sense, the (marginal) distributions ш, converge to д 
and the (conditional) distributions у,(ғ, .) converge to v(x, .). Further, let the joint 
distributions А„ converge. It is plausible that А„ then олу to the distribution 


A, defined by д and >(-, :), ie. 

МАХ В) = 1 v(x, В)4и 2. (3.2) 
over rectangle sets А x B. This defines a distribution A, uniquely on (Z, U). In what 
follows, by Ay we mean the distribution defined by the relation (3.2). 


381 
10 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERES A 


Theorem 1: If the sequence of (marginal ) distributions {и„} converges strongly 
to p and if for almost all [p] the sequence of (conditional) distributions {v,(x, “)) converges 
strongly to v(x, +) then the sequence of (joint) distributions {Àp} converges strongly to Aq. 


Proof: Let g(x,y) be any U-measurable function on Z, bounded by К. 
We define sequence of S-measurable functions v,(x) as follows : 


oux) = fo, y) wv, dy). ... (3.3) 


Tt is plain that |v (x)| < К and that 


sw)» а) = [942 y) v(x, dy) © (34) 
for almost all [y]. 


On application of Lemma 1 we find that 
S glx, 44, = Гањ f, gos у) wle, dy) 


= " vy (x )dj, f: v(x)dy. 


TW еі glx, y)v(z, dy) 


= fgle,y)dr -— a (9:5) 


A 
where A(C) = J x(x, Cadu with С, = (y: ye Y, (х, 0) С). (Sée/Halmos, 1950). 
^" 
This А is the same as Ay. Thus An Ao. 


Theorem 2: If the sequence of (marginal) distributions {и} converges strongly 
ош and if the sequence of (conditional) distributions {yn (a, .)} converges weakly to v(x, .) 
for almost all аи], then the sequence of (joint) distributions (A, converges weakly to As. 


Proof: The proof of this theorem is on the same lines as Theorem 1 and so 
is omitted. 


In Theorems 1 and 2 we assumed that the marginal distributions converge 
strongly. We now ask ourselves what happens if the marginal distributions converge 
only weakly. Naturally, one expects that one shall have to strengthen the mode of 
convergence of the conditional distributions. We have given an example in section 4 
to illustrate this fact, that, in general, such strengthening would be necessary. 
Uis difficulty in this situation is that the conditional distribution at the n-th stage 
is defined almost everywhere with respect to p, and the [„] null z-sets, the sets of 
misbehaviour of У» (6, *), vary with m. Thus we should introduce some smoothness 
restriction on the conditional distributions. 
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Before presenting the last of our theorems we prove several lemmas. 


In the following lemmas, J is a compact metric space and М = R, 9(0, т) 
is a bounded continuous function on x М. Jis апу bounded intervalin M. {v,(0, -)) 
n = 0, 1, ...is a family of sequences of probability measures on М indexed by 0 in 
І. w(0,-) converges to v(0, -) the UC* sense relative to 0661. Let 


000) = f g(0,m) %,(0, т), n= 0,1,... ... (3.0) 


Lemma 2: ((0, m) is equicontinuous in Ө at each т, i.e. if 0,— Oo then 
glên, m)—9(0., m) uniformly in me J. 

Proof: This follows immediately from the uniform continuity of ((0, т) on 
IxJ. 

Lemma 3: d | 9(0)-%(0) |0 ав n> оо. sx 248) 


Proof: This lemma is а simple corollary of the general results of Ranga 
Rao (1960), (1961). We however, give here a short proof for the sake of continuity. 
ТЕ (3.7) were not true, then there would be a sequence (0,) and а > 0, such 

that 
| 9(6,)-%(0,)| > « for each n. ... (3.8) 


Since J is compact, there is a subsequence (05) such that 0 ,—6, as r— oo. 


We then have 
ҰҚ, 
| Un, (Un) —vo(n;) ] 2 


"s 


Ж =. |у g(8n,, m) ул, (Ore, dm)— S Ilm, m) убт, dm) | 
% A 40», т)-((0, т) уч, (Orr, dm) 


жә Р ІІ (6, т) vn; (Onr dm)— J (A, т)у(0 dm) | 
+ 11-40, m) (Bs, ёт) – |46, т)м(0һ,, dm) | 
ets П 9000, m)—g(8n;, m)| (Өл, йт). 


Given any є > 0 the first, second, third and fourth terms оп the right hand side 
can each be made <e by using Lemma 2 and (2.5), (2.5), (2.7), and Lemma 2 and 
(2.7), respectively ifr > R. Since в > 0 is arbitrary, this is a contradiction to (3.8). 


Hence the lemma. Е 
Lemma 4: 1 (9) is continuous in 0. 
Proof : This follows immediately from (2.7). 
Let и, (т) be a sequence of functions on a separable complete metric space 


М converging to a bounded continuous function u(m) uniformly on every compact set. 
Let и» be a sequence of probability measures on М converging weakly to д. We then 


have the following lemma. 
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Lemma 5: f ш.(т)4и,-> | u(m)dy. 
Proof : Тһе proof is immediate. 


Е Tn the following theorem У = В, and Х is any separable complete metric 
space. Here we impose certain conditions on the conditional distributions similar 
to those employed by Parzen (1954), Steck ( 1957) and others. 


Theorem 3: Let the sequence of (marginal) distributions {lta} converge weakly 
to p. Let the sequence of (conditional) distributions (8, “)) converge to v(x, -) in the 
UC* sense relative to те I for every compact subset I of X. Then the sequence {A,} 
of (joint) distributions converges weakly to Лу. 


Proof : : Let g(x, y) be any bounded continuous function on X x Y. 
Let u(x) = [969 vale, dy), ще) = [és у) ve, dy). 
Lemma 3 shows that и,(>)-» u(x) uniformly in v e 1 for every compact I C X. 
Lemma 4 shows that u(x) is continuous in v. 


Now, using Lemma 5, 


J ate, y)dA,, [dis І glz, у) Val, т, dy) 


J аы (а)аиь э [хам 
СІ Jes, у)уа, dy) 


ll 


ll 


J g(x, y)dà ў 
where Л(С) = І dy 1 v(x, dy) with 0, = {y : (ж, у)є О}. (See Halmos, 1950). Thus 
> т 


А = M and A, А, 


4. А COUNTER EXAMPLE 


= We present below an example to show that some such conditions, as imposed 
in Theorems 1, 2 and 2 are, in general, necessary. 


eae X and Y are the real line and S and 7 the usual field of Borel sets. The random 

variable (Ens т) takes the values (1/n, In), (Пт, 1--1/n), (141 Jn, 1/n) and (14-1/n, 
а t ee 2-8 

> 1+1/n) with probabilities зв в, and 3 respectively if n is even and with 
(RC) free Асқа Sone Y 3 

probabilities 787 742 g and | respectively if n is odd. 
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It is easy to see that the marginal distributions of Е, and 7, converge weakly 
to the same distribution with masses it and > at 0 and 1, The conditional 


distributions are trivially convergent. The joint distributions do not converge. 


5. AN ILLUSTRATIVE APPLICATION 


Tn section 1 we have alfeady mentioned some applications of the results of 
section 3 made in earlier works. As an illustration we can deduce the asymptotic 
distribution of several sample quantiles from that of a single quantile and Theorem 2, 
In particular, assuming the theorem for the asymptotic distribution of the sample 
median (see Smirnoy, 1949; Cramer, 1946), we will show that the sample first quartile 
and the median are jointly asymptotically normally distributed. 

Let 2, ...,2,,,, bé 4n--3 independent observations on a random variable Z 
with distribution function F(z) which possesses a density function f(z). It is assumed 
that f(z) is continuous and nonzero at 0, the population median and at à, the popula- 
tion first quartile. : 

Let us denote V/(25,,5,—0), Valine) by (Ens л) where Ziona is the 


sample median and 2,1) is the sample first quartile. When £, is fixed at 2, 7, is the 
normalised median of a sample of size (2n+1) on the random variable Z truncated 


to the region ( --00, 0475), and hence is asymptotically normal. Some algebraic 
computations show that the mean and variance of the limiting conditional distribution 


(2 
аге 2/0) (6) а дел Since the densities converge pointwise (Cramer, 1946) 


2/0) ^ sap) 
Én tends strongly to the normal distribution with mean zero and variance UO у 


An application of Theorem 3 shows that the joint distribution of (Én, ùn) 
converges to the bivariate j distribution with means zero, zero, variances . 


1 rând correlation — 


1 
167500) ° 64758). 2 3 
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BEST ONE-SIDED BOUNDS FOR INFINITELY 
; DIVISIBLE RANDOM VARIABLES 


By HOWARD G. TUCKER 
University of California, Riverside, USA 


р SUMMARY. The essential supremum (essential infinum) is found for a random variable with 
infinitely divisible distribution which is bounded above (below). Necessary and sufficient conditions 
are given that almost all sample functions of a continuous separable stochastic process with independent 
increments be discrete. A direct probabilistic proof is given that a random variable with infinitely 


divisible distribution be unbounded. 
` 


1. INTRODUOTION 
A random variable X is said to be infinitely divisible if its characteristic func- 
tion øy (u) can be written 


© € 
фх(и) = exp { iuy+ i} (^E) 26 ав), pase (12) 


where у is a fixed real constant апа G is a bounded, real-valued, non-decreasing func- 
tion. Clearly, X is degenerate (i.e., constant with probability one) if and only if G 
is identically constant. Pakshirajan and Chatterjee (1956) proved that a non- 
degenerate infinitely divisible random variable X is unbounded. Recently, Baxter 
and Shapiro (1960) found necessary and sufficient conditions that a random variable 
with an infinitely divisible distribution be bounded above (below), and they exhibited 
an upper (lower) bound. This result was used to obtain necessary and sufficient 
conditions that almost all of the sample functions of a separable process with stationary, 
independent increments be non-decreasing (non-increasing). 

The purpose of this note is to sharpen both of these results. In Section 2 it 
will be shown (Theorem 1) that the one-sided bounds exhibited by Baxter and Shapiro 
(1960) are best bounds. In Section 3 necessary and sufficient conditions are given 
that almost all of the sample functions of a separable, continuous, decomposable 
process be discrete, i.e., almost every sample function be a function of bounded varia- 
tion over every bounded interval and such that its difference over every interval be 
the sum of its discontinuities. In Section 4 a simple and direct proof of the 
unboundedness of a non-degenerate infinitely divisible random variable is given. 


2. THE BEST ONE-SIDED BOUNDS 
The technique used in this section and in Section 4 consists of a particular 
interpretation of a certain class of probability distributions. We first discuss such 
distributions. , $ 
Let X, X,..., Бе a sequence of independent, identically distributed random 
variables with common distribution function F. Let Y be a random variable with a 
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Poisson distribution, HY = A> 0, and assume that Y, X,, X5, ..., Xp, ... аге inde- 
pendent. Then the distribution of the random sum of random variables 
ip Bae es Ау 


has as characteristic function 
datu) = exp А f (P —1) аға). 


- Conversely, if Gis a bounded, non-decreasing function over (—00, oo), and if a random 
variable Z has as its characteristic. function 


prlu) = exp f (e*—1) 4664), 


then 2 has the same distribution as a random sum, 
01-Ғ...--бу, 


where (U,) are independent, identically distributed random variables with common 
‚ distribution, function 


F(x) = Gæ) оо) —6(—оо)), 


where V. has a Poisson distribution with EV = G(4-o00) —G(—00), and V, Uj, U,, ... 
are independent. 


ў Baxter and Shapiro (1960) proved that if X is a random variable with infinit ely 
divisible distribution (1.1) and which is bounded above, then 


-0 
PIX « y— | + аб) = 1, 
—00 e 
the existence of the above integral being a consequence of the hypothesis of X being 
| bounded above. We now sharpen this result. 


e hebrem т ИХ is & random variable with infinitely divisible distribution 
(1:1) and із with probability one bounded. above by a constant, then 
4 -0 
ess sup X = y— | сү dG(x). 
т 
" —00 
ўя (а) If X is a random variable with infinitely divisible distribution (1:1) and 
is. bounded. below by: a constant with probability one, then “ 
E S 
essinf X — y | 46). 
0+2 
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Proof : We need present only the proof to part (i) of the theorem. By 
Theorem 1 in Baxter and Shapiro (1960) the following conditions hold : 


oo 


К [ ш) 0 tr all c. 0; 


a 


u 


(1) 02 = Q(.-0)—G(—0) = 0, апа... 


AN A ИЕ 
li f 
(iii) vi ш 1 | а dG(x)du < оо 


u 
We denote M(u) = f cR dG(x) for и < 0. In case М(—0) = 0, the assertion 


of the theorem is trivial. We first consider the special case where 0 < М (—0) «o. 
As was mentioned earlier the assumption of boundedness above implies the existence 
-0 " 
of the integral | Ш dG(x), and thus we may write 
5 ; 


log фу(и) eg. (e — 1M (a) tiud, 


22-0 


where А = uer | 1 4%). 


—® 


Let Y be a random variable whose characteristic function is given by 
ye 
log Фу(ш) = | (e*—1)dM(a). 
-o 


Let (U,) be a sequence of independent identically distributed random variables with 
common distribution function 
T M(x) M(—0) if z«0 


F(x) = 
1 itm 22.0. 


Let V be a random variable with Poisson distribution, ЕТ = М(—0), and such that 
V, U,,..., U,,... are independent. Then the distribution of Y is the same as that 
of U,J-...--Uy. Since РГО, < 0] = 1 for all л, then it follows that P[Y < 0] = 1, 
and А К Жу 

` PLY = 0] =е М — 0, 


Hence ess sup Y = 0, from which we obtain that ess бір X = A,, thus proving the 
theorem in the case when M(—0) < о. Now we prove it in the case М(—0) = оо. 


- 


389 
11 


>- SANKHYA: THE INDIAN JOURNAL ОЕ STATISTICS : SERIES А 


Since from Baxter and Shapiro (1960) we know that ess sup X < A, then we need 
only show that for every e > 0 that 


=e 


es sup X>y— | 5, М0). 


-т 


Е we let 


8-Х-у%| i5 ame), 


then it is sufficient to prove that P[S, > 0] >0 for every е > 0. We first observe that 


~0 -0 -0 

4 ius DIES] x =— 2 LM (a). 

log 2 (и) 1 (ем —12M(z)—iu J ра М). Let К, 1 гг Mle) 

Since M (—0) =оо, then К, > 0 for every е > 0. Let у 2 0 be such that —7 is a 
continuity point of M(x), and let U, and V, be independent random variables such | 
that і 


— 


log Фо (и) = 23 (e 7 — 1)aM (2), 


log фу (и) = f ('©—1)4М(а). 
2 — 


`a Now the distribution of 8, is the same as that of U,+V,+K,. But V,—0 in pro- 


bability as 70, and V,+K,—K, > 0 in probability as 7 —0. Hence there exists 
an 7 > 0 sufficiently small so that P[V,J-K, > 0] > 0. Since 


[V,J-K, > 010, = 01 С IS, > 01, 
and since (by the special сазе proved when M(—0) < со) 
LU, — 0p = e Men = 0, 


we obtain 0 < P[V,--K, > РО, = 0] < PIS; > 0], 


which proves the theorem. 


3. DISCRETENESS OF SAMPLE FUNCTIONS OF A SEPARABLE | 
CONTINUOUS DECOMPOSABLE PROCESS 
: Baxter and Shapiro (1960) found necessary and sufficient conditions that there 
exists Е constant A such that almost ай the sample functions of X(t)—At be non- t 
decreasing over [0, 1], where X(t) is a separable process with stationary independent 
increments. In this section, this result is extended to a process X(t) which is a separable 
Continuous decomposable process (see Loàve (1960), рр. 534-554). At the same 
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time it is sharpened in that necessary and sufficient conditions are given that almost 
all sample functions over a t-interval [0, 71 are discrete. 


We shall say that a real-valued function g defined over an interval (а, b] is 
discrete if and only if (i) it is of bounded variation, and (ii) for every zc[a, b], g(x) equals 
the sum of the discontinuities of g in [а, x]. А continuous decomposable process is 
а process with independent increments and is continuous at each value of t with pro- 
bability 1. At each 7, the characteristic function of X(t) is given by 


lo = inc [ee e o rte .. (а 
E $r (0) = а (0+ Г (е to 46у), 4 (81) 
where a(t) is a continuous function of t, G(x) is for each # a bounded, non-decreasing 


function of 2 and for each 2 is a non-decreasing, continuous function of $. (Note: 
in what follows, G,(x) = G(x) at t = 1). 


Theorem 2: Let X(t) be а separable, continuous, decomposable process, 
telO, 1]. In order that there exists a continuous (sure) function f over [0,1] such that 
almost. all sample functions of X(t)—f(t) are discrete (as defined above) it is necessary 
and sufficient that 


G) G(+0)—G,(—0) = 0, 


and 
10 -e ч 
(ii) i ( ГГ At 2* deledu | | = a ee) <o. 
Е e и -1-0 
In such a case, 
T 
JO =a) f} 460). 


Proof: We first prove that conditions (i) and (ii) are sufficient. First, we 
may assume that P[X(0) = 0] = 1. We shall use the fact that, for 0 < tı < t, 


oo 


[E as o- | AE a8, 


а gt 


“ 


is the (finite) expected number of jumps of X(t) of magnitude > u > 0 during the 
interval |8,1, and 
u u 


f 722 ав) f 1+2 gg, (a) 


E: gi 


-—00 


is the expected number of jumps of X(t) of magnitude < ù < 0 during |в, ta]. 
Because the number of jumps of any magnitude outside a neighbourhood of 0 is a 
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non-negative random variable we obtain (from the additivity property of expectation) 
that if 0 — t< 1, then: 


R 
І 14-а аба) < 
ч а? u 
and Г ae ‚ 4б a< te 46) (#) for u> 0. 
-00 
Thus, condition (ii) implies that 
10 me ч 
Я 1-22 1--а? 2 
lim )Фи- 16(а)4ш | < co 157 (9.2 
(И se] | A eum) <n. өз 


for all t e[0, 11. 


Now let U(t), V(t) be two independent separable continuous decomposable 
processes such that 


oo 


В y 2 
log $o (u) = |(644-1- 9.) EP? agg, 
+0 Йе т 
~0 - ‘ IT 
LUN isu иж аз 
апа log фус (и) n (е —1— Ipa = д dG (x). 
Because of condition (i) we may (effectively) write 
X(t) = a(t)-+ U(t)+ Vit). 
By (3.2) and integration by parts we have 
1 co ШЕ” 
li ж 
Jin | Í E dG(x)du 
e u 
T 14-2? xps t 1--22 
= ШК (i m 00060) —є ee aG,(«)— f = E ae) «o... ($3) 


The existence of the improper integral 1 f(x)dz, when f(z)-»oo as 210 implies that 
(є) 0 as €|0. Hence the МЕН (ii) implies that 


= 0, 


еј Es 


Thus (3.3) implies that 


pus 


ава) = [ + Miu йб) co, 
04- 


392 


"iE 


: 


BEST ONE-SIDED BOUNDS FOR INFINITELY DIVISIBLE RANDOM VARIABLES 


and therefore 
со 


| uL 4642) < co for all t e[0, 1]. 
ec 
0- 
In a similar manner, (3.2) implies that f z абда) > —oo for all ¢ e[0, 1]. 


— 00 
Hence 
tl 
U'() = 00+ | 40а) 
+” 
is a process whose characteristic function is given’ by 
oo 
. 1 2 
log dow (u) = f (6—1) EF ао). 
0+ 


By Theorem 1, if 0 € ^, < t, < 1, U'(t,) —U'(t) is bounded below by zero, and there- 
fore almost all of the sample functions of U’(t) are non-decreasing. For fixed 1, е > 0, 
let U(t) denote the sum of the jumps of U'(.) (or of X(-)) during the interval [0, t] 
of magnitude not less than e. Since 


oo 
A dux 14-2? 
log o (v) = [ (eM —1) EF aeo, 
the distribution of U'*)(/) converges to that of U’(t) as € { 9. Since, for fixed t, U(t) 
is monotone non-decreasing as є | 0, we obtain 
U'(t) = lim UC (0) 

elo 
with probability one. Hence U'(f) equals the sum of all positive jumps over [0, t]. 
Similarly, if We define ; ; 


be 
r 1 
Vi= Vet | — 46), 


we obtain that V; is the sum of all the negative jumps during [0, t]. Now it is known 
that almost all of the sample functions of X(t) have discontinuities only of the first 
kind. Hence by condition (i) we may now conclude that 

o 


2-09-70 = alt)— f dea) 


= 
is a continuous sure function, and the sufficiency is proved. 


A Conversely, let us suppose that there exists a continuous function f over (0, 1] 
such that X(/)—f(!) is а random function for which almost all sample functions are 
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discrete. Let Y(t) denote the sum of all positive jumps in (0, t], and let Z(t) denote 
the sum of all negative jumps in [0, t]. If У, (0) denotes the sum of those positive 
jumps in [0, t) which are in magnitude not less than c > 0, then 


RT doen 
log dy wm = | 1) 3 дедш). 


By hypothesis, Y,(t)>Y(t) as є { 0 with probability one for each ¢. Hence, 


log rolu) = | (6—1) Ie. 46»). 
o+ 


For each e > 0, and each /, Y, (t) has an infinitely divisible distribution (1.1) with 
vu 
= | — 4042), 
у= | 4942) 
e 


f 6/2) if z >e 
(>) = 
Qile) if x < e. 
Since У, (У, as € | 0 with probability one, then by the well-known convergence 


theorem (Loéve (1960), page 300), it follows that Y(t) has an infinitely divisible distri- 
bution (1.1) with с 


сел 
== | ai dG (x), 


{ G(x) if x>0 
Ge) = 
900+) if ж < 0. 


Since Y(1) is a non-negative infinitely divisible random variable it is bounded below 
by zero, and hence by Theorem 2 in Baxter and Shapiro (1960) we obtain 


© 
+ 


1 о 

А 1-22 

ns | Г TH dG.(x)du < оо. 
e ч 


Similar consideration for Z(t) implies that 


it u 
Я 1а? 
lim f 1 а < 


We now prove that 6,04:)-6,0--)-- 0. Since X(t) = f(t) Y(t)-- Z(t), then 


log фун (и) = iuf (t)+ J (1) е dG(x). ТЕ we compare this with (3.1) 


we obtain 6(0--)--6(0-)-0 for all 2 e[0, 1]. Thus conditions (i) and (ii) аге 
necessary, * ‘ 
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4. DRIEOT PROOF ОҒ UNBOUNDEDNESS OF INFINITELY DIVISIBLE RANDOM VARIABLES 


The proof of the unboundedness of а non-degenerate infinitely divisible random 
variable given by Pakshirajan and Chatterjee (1956) is essentially a non-probabilistic 
proof. The proof given by Baxter and Shapiro (1960) is a corollary to their more 
comprehensive Theorems 1 and 2. Actually, this result is а direct consequence of 
two rather simple lemmas. 


Lemma 1: If X and Y are two independent random variables, and 
if Y їз unbounded above (below), then X+Y is unbounded above (below). 


Proof: Suppose Y is unbounded above, and letz be any real number. 
There does exist an x such that Р[Х > х] > 0. Since Y is unbounded above, 
P[Y > 2—0] >0. Now, 


[X > z][Y >г2-а1С(А--УҮ > 4], 
and because of independence of X апа У we obtain 


P[X+Y > г] > P[X> 2) P[Y > 2—2] > 0, 


which proves the lemma. 
Lemma 2: Let F be a non-decreasing function over (а, b] with F(b) > F(a), 
and let X be a random variable with characteristic function given by 


log фх(и) = Пі (e*—1)dF(a). If b < 0, then X is unbounded below, and if 


a> 0, then X is unbounded above. 
Proof: We prove this in the case а > 0 only. By the discussion in Section 
2 prior to the statement of Theorem 1 it follows that 
PIX > па] = TOPO (F()) Ва)! > 0 
for every positive integer n, which proves the lemma. 


From Lemmas 1 and 2 the proof of the unboundedness of an infinitely divisible 
random variable Z with characteristic function (1.1) is immediate. Since G is not 
constant then (i) GQ(0--)—G(0—) > 0 or (ii) there is an interval (а, b] such that G(b) 
> ба), where а >> 0 orb < 0. If(i)is true, then we may write (from a distributional 


point of view) 
Z2U--V, 


where U and V are independent, and 
log фо(и) ——(6(04-)—6(0—))w[2, 
Я ТЕЗ iua lpr? 
log go(u) = iyw f (81—06) DB ай). 
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Since U has a normal distribution, it is unbounded, and by Lemma 1, Z is unbounded. 
Tf (ii) is true, then we may write 


Z = ВТ, 
where S and 7 are independent random variables, and 
b 
i 14-22 
тов фи) = |е) LL dae), 


а 


-f | [ ( qim] Ds ) ES. dG(a). 


By Lemma 2, S is unbounded, and by Lemma 1. Z is therefore unbounded. 


REFERENCES 
Baxter, GLEN апа Внаріно, J. М. (1960): Оп bounded infinitely divisible random variables. Sankhya, 
22, 253-260. 
‘Lobyn, MicuEn (1960): Probability Theory, D. Van Nostrand, Princeton, (second edition). 
PAKSHIRAJAN, В. P. and CHATTERJEE, 8, D. (1956): Оп the unboundedness of infinitely divisible 
vo Bankhya, 17, 349-350. 


laws. 


Paper received : June, 1961. 


eo» 


396 


TCHEBYCHEFF-TYPE INEQUALITIES ІМ TERMS ОҒ 
THE MEAN DEVIATION 


By GERALD J. GLASSER 
New York University 


SUMMARY, In this paper, inequalities for the frequency of observations lying in an interval 
are obtained in terms of the mean and deviation similar to the classical Tchebycheff inequalities. 
The performance of these inequalities are compared with the usual inequalities in some interesting 
special cases. 


1. INTRODUCTION 
Given the values of the mean (и) and the standard deviation (c) of а popu- 
lation, the use of the well-known Bienaymé-Tchebycheff inequality suggests distri- 
bution-free limits for the relative frequency of the observations that lie within a central 
interval, This particular result, however, is a special case of a general distribution- 


free inequality in terms of the mean and the standard deviation, that covers any 
interval including the mean value. This is due to Selberg (1940) who showed 


0 if А <1 
kk,—1 
(kitka)? 
даа) if ka(kı—ka)/2 > 1 
where k, > 1% > 0 (the subscripts шау be interchanged for k, > kı > 0) and where 
P(a, b) is the probability that a variate, т, satisfies а < 2 <b. Two special cases 
of (1.1) have kı = Ё, and k,—oo for 


Рис, pho) >< 4 >>)... (11) 


{ 0 ifk<1 
P(u—ko, ис) > леу (20) 
Маа рт 

(the Bienaymé-Tchebycheff inequality), and 


P(—c, u+ko) > 11+) ifk>0 Ш) 


respectively. АП of these results can be somewhat improved with additional infor- 
mation about the population, a minimum or maximum value, or some higher order 
moment, for example (cf. Godwin, 1955). For the information given, however, 
these are the best possible inequalities. н 

This note considers comparable results in terms of the average deviation about 
the mean, ô, and contrasts them with those available in terms of the standard devia- 
The use of ô rather than с may permit a somewhat sharper inequality, particu- 


tion. 
because the latter is greatly inflated by extreme 


larly when is only a small fraction of o 
values. ; 
3897 
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2. RESULTS IN TERMS OF Ô AND COMPARISONS 
Given 6 rather than с, the best possible result comparable to (1 1) shows 


Pee) e" ie Ge Ha 
(и—0, ид) > i-3(z65) tite > (6+) 72 (21) 
the proof of which follows from д > 4,6--рі,0-- | qt,o—pt.d| where p = 1—P(—oo, 
4-1-0) and 4 = 1—P(u—1,9, оо). For the special cases & = t, and t,00 


0 і<1 
P(u—t0, и-- 9) > { (2.2) 
Е ret 
0 t<1/2 
and Р(-о,и-Н8 > { А (2.3) 
1—1/24 4>1/9. 


The two sets of inequalities may be contrasted by considering a given interval, 
ис = и—6д to ис = и--і,0, and comparing the lowest bounds provided 
by (1.1) and (2.1). The larger lowest bound is obviously the better, and the com- 
parison shows that this is provided by (2.1) if any of the following conditions is satisfied 


DAA 
| + Аһ S 1 


2 < 2I E (5 — ka)? 4-4] 
с (ky + ka)? 


hh» 1> kkk)? = (4) 
ky 
Ermita) МА-А. 


For the special case k, = k, the condition for (2.2) to be sharper is 


6 с р 
и’. ... (2.5) 


while with &,—> со, (2.3) is better than (1.3) if 
ô 2k 
uA E 


"These inequalities can also be put in terms of / rather than k, but in any event clearly 
depend on д/с being smaller than some function of the width of the specified interval. 

For гоша intervals, inequalities іп terms of д and с can be developed that аге 
better than either (1.1) or (2.1), but these will not be considered here. 


Е Fora кела comparison of (1.1) and (2.1), it may be noted that the inequalities 
given also permit one to locate, within some interval about the mean, the p-th centile 


398 


TCHEBYCHEFF-TYPE INEQUALITIES IN TERMS OF THE MEAN DEVIATION 


of a population, ¢ Thus, (1.3) applied back and forth shows that (cf. Hoeffding, 
1952, р. 173). 


пр o< G < we BE py TEA 
while (2.3) may be similarly applied to show 
ш-4/9р < &,'< и). ies) 


The width of the firstintervalis ¢//p(1—p), while the width of the second is 6/2p(1—p). 
Therefore, (2.8) is sharper if 


i DASS | 
= < 2/12). ... (2.9) 


In particular, the inequalities locate the median within one c, and within one 6, of 
the mean, and since usually д will be somewhat smaller than с, (2.8) provides somewhat 
narrower limits. 


3. VALUES оғ d/o 


The relative precision of (1.1) and (2.1) in the two types of problems discussed 
depends upon the value of 6/с, which will vary from one type of population to another. 
From their definitions it follows that 


o?— = var |x—p| 
so that their difference depends on the variation in the absolute deviations of the 
observations from their mean. Thus, for populations with extreme deviations, 
will tend to be somewhat higher than д and д/с will tend to be small. 
_ A distribution-free result due to Chakravarti (1947-48) is of interest in this 
connection, He shows that for N variates (N > 2) 


МУУ < je < ЛП (М odd, с Æ 0) 
(3.1) 
< 


46 < 1 (N еуеп, с #0) J 


so, for N—00, d/o may range from 0 to 1. 

Although the results given are distribution-free, it is somewhat instructive 
$0 study their precision in the two problems discussed for various population models, 
though for such models exact answers, or at least more precise inequalities, may 
usually be derived. г ў 

Thus, for a normal population 6/o = +/ 2/л = .7979. This suggests, for example, 
that (2.5) is satisfied for .8 < k < 1.25 and that (2.6) holds if.5 < k < 2.0. Further, 
(2.9) is satisfied for .2 < p < .8. So for a normal value of б/т, the inequalities based 
on ô are sharper for locating probabilities or percentiles in the central part of a distri- 
bution. For the exponential population 6/0 = 2/e = .7358, so similar results are 
obtainable, 
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For various geometric models one has : (a) rectangular, d/o = ЗА = .87, 
(b) triangular, d/o = 4/2/8 = .82, (c) right triangular, d/o = 164/2/27 = .84, and 
(d) V-shaped, d/o = 4/8/9 = .94. АП values of d/o are higher than for а normal 
population so that the inequalities based оп à would be better than those based on 
c only in more limited cases. 

For Pareto-type distributions, however, the inequalities based on д will very 
often be sharper. If the Pareto coefficient, æ, satisfies 1,0-< < 2.0, д exists but c 
does not (a value of about ж = 1.5 is often observed). And even if о > 2, d/o is likely 


to be small for 
ЕЕЕ 


When @ = 3, the ratio is .51 while if æ = 5, it is 69. This model therefore provides 
an illustration of the type of situation in which inequalities based on ô are better 
over a wide range because the value of g is so much more affected by extreme devia- 
tions from the mean. 


4. APPLICATIONS 


Applications of Tchebycheff-type inequalities ате well known and need not 
be discussed here in detail. It may be noted, however, that they are often useful 
in analyzing secondary data where estimates of only a few descriptive parameters 
have been made available. Тһе results given here, therefore, may have some signi- 
‘ficance for those who report such data when it is desirable to give a measure of 
variability. Unless some population model can be assumed, the choice of which 
measure of variability to report is often arbitrary. The results given here suggest 
that if itis known that the population is one with highly extreme deviations from 
the mean the value of д might well be somewhat more meaningful, in at least one way, 
to users of the data than would the value of c. 
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A NOTE ON FUNCTIONAL MINIMALITY OF 
SUFFICIENT STATISTICS* 
By EDWARD W. BARANKIN 
University of California, Berkeley 


SUMMARY. Theorem 4.3 in the article of Barankin and Katz (1959) states that the minimal 
dimensional sufficient statistics 7% and "Tt, there constructed, are locally functionally minimal almost 
everywhere іп R (==the set of regular points), whereas in fact, the correct assertion is that they are locally 
almost everywhere functionally minimal almost everywhere іп В. This latter, valid statement is proved 
here. Also, we give corrected forms of certain results that were derived, in our work on sufficient рата. 
meters, from the misstated Theorem 4.3. 5 


1. INTRODUCTION 


The purpose of this note is to correct an erroneous assertion made in the article 
‘Sufficient Statistics of Minimal Dimension" by Barankin and Katz (1959). Theorem 
4.3, in that article states that the sufficient statistics 7* and T+, which were constructed 
earlier, are locally functionally minimal at every point of an open set A С R( = the set of 
regular points of Q ) with R—A of Lebesgue measure 0. This is explained to mean that for 
each point xeA there is a neighbourhood М of х such that if T is any sufficient statistic then 
T* and T+ are functions of T in N. Now, with this meaning, the affirmation of local func- 
tional minimality is not strictly correct in general. It is true in special cases; for example, 
by virtue of Lemma 2.3 іп В. К. (we shall henceforth refer to the above quoted article 
as В. К.) it is true if 7 is Euclidean and regular in A. (The reason for this will become 
apparent with the ensuing discussion.) But in general, if 7 is just any sufficient statistic 
according to the characterization given by Lemma 1.1 in В. K., the correct assertion is that 
T* and T+ are locally almost everywhere functionally minimal everywhere in A. That is, 
for each xed there is a neighbourhood N of z such that if T is any sufficient statistic then 
there is a subset Oy of N; of Lebesgue measure 0, such that T*(x') = T*(z") and 
Та) = T+(x") whenever z' and 2" are points of N—Cy with T(x') = T(x"). 

The proof of Theorem 4.3 in В. К. was reduced to the following step : for a certain 
collection of indices, jy js, ..., jr, and a corresponding collection of points, 0%, go, .., ga, 
of ©, to show that each of the functions 


( 9 log p 2 (1) 
20, д 2,0% 


is a function of an arbitrary, given sufficient statistic 7. То establish this we proceéded from 
the characterizing equation for the sufficient statistic T 


plz, 6) = f(T(), 9)g(2). AS ep) 
*This paper was prepared with the partial support of the Office of Naval Research (Nonr 222-43). 


This paper in whole or in part may be reproduced for any purpose of the United States Government, 
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taking the logarithms,’ differentiating with respect to А and then evaluating at 0. This 


gives the result 
ô log p 9logf(Ti.)| , 13 
28. = 20; E) 
% 2,00. Ik e^ 


from which it/then appears as evident that (1.1) is"strictly a function of T. But this proce- 
dure is not generally valid.) And this is because (1.2) is not given to hold for all гапа 0. From 
Lemma 1.1 in B. K. we see that what welmay assert, in general, regarding (1.2) is that for 
each 066 it holds for almost all (Lebesgue)reQ. Under these circumstances we cannot 
draw the conclusion that f has the 0-differentiability properties that are needed to obtain 
(1.3). Thus, the result of strict functionality cannot be asserted. 

If 7 is Euclidean and regular_at а point 2°, then Lemma 2.3 in B. К. provides 
us with functions / and g in (1.2) which are suitably differentiable and such that (1.2) holds 
for all in some neighbourhood of 2? and all Ge@. In this case, the above argumentation is 
valid for all x in a neighbourhood of 2°, and one obtains the correct result that (1.1) is strictly 
a function of 7 locally about 2°. This is, of course, а very special situation. 


In Section 2 below we shall state and prove the valid form of Theorem 4.3 of B. K. 


‘There has so far appeared in the literature only one application of the incorrect 
Theorem 4.3; this is in the article of Barankin (1960). There, in Theorem 5.5 and Corollary 
5.5.1, we have obtained consequences regarding the identifiability of parameters. In Section 
3 below we shall give corrected statements of these parametric results. Once again it will 
be a matter merely of substituting almost everywhere functionality for everywhere func- 
tionality. 

. е ask the reader's indulgence in having before him, as he reads this note, the two 
papers referred to above (see the References at the end of the note). Indeed, if we were to 
attempt to make this note self-contained, we should have to devote several times the present 
number of pages to purely preliminary discussion. 


2. CORRECTION OF THEOREM 4.3., тм В. К. (1959) 


We wish to prove now the following : 

Theorem 4.3 (B. K., 1959, corrected form): The sufficient statistics T* and 
T*, of the preceding two theorems, are locally almost everywhere (Lebesgue)} functionally 
minimal at almost all (Lebesgue) points of В. That is, for each point x in a set A C В, with 
В— А of measure 0, there is a neighbourhood N of x such that if Т is any sufficient statistic then 
there 18 а subset Cy of М of measure 0, such that T*(x') = Т%(а”) and T+(x') = Таа") whenever 
&' and а" are points of N—Cy with T(x’) = T(z"). 

Proof: It is readily seen that the argument adduced in B. K. to derive the stated 
assertion for T+ from that for 7% is applicable in the present situation. That is, in a word, 
T* is strictly a function of T* on each к; › and therefore if 7% is almost everywhere а func- 

LE 6 о 
tion of T in Кы then it follows that the same is true of 7+. 
Let us then consider T*. The set A is taken to be the disjoint union of open 


(n-dimensional) intervals, (Jp; Жы For a point « in one of the Ks, we take this E, to be the 
neighbourhood N spoken of in the theorem. On the cell Ки the statistic 7% is, for some r, 
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an r-dimensional vector function whose components are of the form (1.1) for some j and 
9m, k=1, 2,...,7. Since there are only countably many of these component functions, it 
clearly suffices to show that, if T is any sufficient statistic, then each of the functions is almost 
everywhere a function of T in K,. And this will follow a fortiori if we show that any function 
of the form (1.1) is almost everywhere a function of T іп Q. It is this that we now proceed 
to establish. 


Let ) be any particular integer between 1 and v, inclusive, and 0 be any particular 
point of @; and define 
0 log p 
уке : УБА САП 
( 00% | ( ) 


Let T bea sufficient statistic; thus, there are functions f and g such that, for each 066, (1.2) 
holds for almost all 260. Specifically, for a given 0, let C, denote the set of points 260 
for which (1.2) does not hold. C, has, then, Lebesgue measure 0, for each 0. 


If ¢ is any real number, let the point 6(t) be defined as follows : 
O(t) = (69,00, ..., GO +t, OP us sss 00), =. (222) 

Thus, in particular, 0(0) = 60. Since Ө is an open set there is a positive number % such 
that 0(Ф)в © for all ¢ іп the interval (0, tọ). Lett, > t, > ... be a monotone decreasing sequence 
of numbers in (0, to) converging to 0. Let O = бо 002 Colm). Clearly С has Lebesgue 
measure 0. And furthermore, if «¢Q—C then (1.2) holds for 0 = 00 and 0 = O(t,,) for all 
m = 1,2, ,.., 

According to this last property of С we obtain the following from (1.2): 


log p(x, Altn))—log plx, 6°) _ log f(T(z), 0(,)) —log /(Т(х), 0%) 2. (3) 
tn { т е TH | $ 
т —1,2,...52€0—0. 


As тоо, the left-hand side of (2.3) approaches у(х). Therefore, the limit of the quantities 
on the right-hand side exists also, and is likewise 7(2). But we see that the quantities on the 
right form identical sequences for two points, z' and x’, іп 0—0 such that T(x’) = T(x"). 
It therefore follows that 7 is a function of T on Q—C. Since C is of measure 0 we have that 
9 is almost everywhere a function of 7 in О, and this completes the proof of the theorem. 


3. CORRECTION OF THEOREMS 2.5 AND 5.5 AND COROLLARY 5.5.1 IN 
BARANKIN (1960) 2 D 
' In the article B. (1960) (we shall henceforward refer іп this way to the second item 
in the References at the end of this note) we showed that sufficient parameters of our family 
of distributions, 2, may be viewed as sufficient statistics of a related posterior family, and 
conversely. (However, regarding the converse, see the discussion on p. 108 et seq. of B. 
(1960)). This remarkable fact enabled us to carry all of the article B. К. (1959) over to 
questions concerning sufficient parameters. And in particular, assertions about functional 
minimality of sufficient statistics yield assertions about identifiability of parameters. 
The statement of Theorem 4.3 for /7% was quoted in B. (1960) as Theorem 2.5. We 
first give here the proper restatement of that theorem. ` 
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Theorem 2.5 (В. 1960, corrected form): The sufficient statistic T* of the preceding 
theorem is locally almost everywhere (Lebesgue) functionally. minimal at almost all (Lebesgue) 
points of В; That is, there is an open set А С. В, with R—A of Lebesgue measure 0, such that 
for each x А there is а neighbourhood N of ж with the property that if T is any sufficient statistic 
for # in N then there is a subset Oy of N, of measure 0, such that T*(a') = T*(x") whenever 
ж and x" are points of N—Cy with T(z') = T(x"). 

It will be noticed that this statement does not refer to a sufficient statistic, 7, for 
39, but to a sufficient statistic for Xin N. We have defined this slightly more general notion 
in Definition 2.1 of B. (1960). The reader will have no difficulty in seeing that the present 
theorem is implied by the theorem of Section 2 above; % is necessary only to observe that 
if T is sufficient for 7? in М then the function which is equal to 7 іп М and equal to the 
identity function in 0--Л is a sufficient statistic for 79. 

In B. (1960) we gave three definitions bearing on the notion of identifiability. It 
will be necessary now to introduce also the notion of almost everywhere identifiability. 
Let us give formal definitions as follows (cf. Definitions 5.2, 5.3 and 5.4 in В. (1960) : 

Definition : A sufficient parameter, U, for 77 is said to be almost everywhere 
(Lebesgue) identifiable if there is a subset D of Ө of (Lebesgue) measure 0, such that 


Hy = My implies 1/0) = (0!) 1 (9.3) 
for all 0 and 0' in 6— D. 

A parameter U of W which is sufficient for 9 in F (a measurable subset of ©), is 
said to be almost everywhere identifiable in J’ if there is a subset D of F, of measure 0, 
such that (3.1) holds for all 0 and 0’ in F—D. 

A parameter U of 2, which is sufficient for 77 about 09, is said to be almost 
everywhere identifiable about 09 if it is almost everywhere identifiable in some neighbourhood 
of 6. 

The reader is referred to В. (1960) for all attendant notions, such as sufficiency of 
а parameter in a given set and sufficiency of a parameter about а given point. 

We are now ready to state the corrected forms of Theorem 5.5 and Corollary 5.5.1. 
And it will not be necessary to supply any revised detail of proof. For, the argumentation 
given in B. (1960) remains fully valid when almost everywhere identifiability is substituted 
for identifiability and the above corrected form of Theorem 2.5 is substituted for the invalid 
one. 
Theorem 5.5 (В., 1960, corrected form): The sufficient parameter U* of the 
preceding theorem is locally almost everywhere (Lebesgue) functionally minimal at almost all 
(Lebesgue) points of Ra. That is, there is an open set B с. Ro, with Ro —B of Lebesgue measure 


0, such that for each 6 е В there is a neighbourhood N of 0 with the properly that if U is any suffi- 
cient parameter for № in N then there is a subset Dy of N, of measure 0, such that U*(0') = U*(0") 
whenever 0” and 0" are points of N—Dy with 0(0') = U(6"). 

Corollary 5.5.1 (B., 1960, corrected form): The sufficient parameter U* is almost 
everywhere (Lebesgue) identifiable about each point 06 В. 
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MINIMAX SOLUTIONS ТО TRICHOTOMIES 
[ 


Ву L. 1. LASMAN 
Florida State University 


SUMMARY. When a three-valued hypothesis is to be tested and actions are available for 
accepting one from the three, it is shown that under rather general conditions, the minimax solution 
to this problem depends upon a pair of values 2” and х” with z' < 2". These are obtained by equa- 
ting the risks under possible alternatives. The decision then is to choose the smallest hypothesized 
parameter if > z < т, choose the largest if X т > ж” and choose the middle value otherwise. 


1. INTRODUCTION 

The explicit method of finding Bayes’ and minimax solutions in the problem 
of testing dichotomies is quite simple and well known for a very general class of pro- 
bability distributions and is set out by Blackwell and Girshick (1954). An article 
by Karlin and Rubins (1956) contains a general theory of the class of Bayes’ solutions, 
and this paper presents an explicit method of obtaining minimax solutions to trichoto- 
mies for a more specific family of distributions. 

The distribution to be considered is one of the exponential type 


f(x, 0) = b(A)e%*9(x), (181) 


where b(0) is a positive function of a real-valued parameter 0, and g(x) is a positive 
function of the real variable x. The normal distribution with known variance, and 
the binomial and Poisson distributions are of this type. 

We are concerned with testing a three-valued hypothesis, that is, with the 
problem of accepting one of the three following hypotheses: 


H;: 6-6 
Но: 0 — б; 
Hj: 6230; j 


where we may assume 0; < 0, < 63. 
Let 24, ..., 2, denote the value of n independent observations on 2; let a; denote 
the action which accepts H; and let ġ(a;|x) denote the probability of taking action 


. aj, given the sample point = = (zy, ...,2,). We require that for each т, 


0< Ф(0;|2) <1 and Уфа) = 1. 


If jn particular j(a;|z) = 1, for only one i, then we have the customary non- 


randomized test procedure. 
Let w; represent the loss ineurred from taking action a; when the hypo- 


thesis H; is actually true. We will assume that the w; are non-negative, that there 
is no loss from choosing the most appropriate action corresponding to the actual 
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parameter value, and that there cannot be a smaller penalty if one accepts а hypo- 
thesis farther from the correct one. This defines a loss matrix as: 


акаи 9 % 


ау 0 шш Ws 
аз Wa 0 бәз 
4 Ша) Фа 0 


where wis > Wa, and wg > Way 

Let £j, i = 1, 2, 3, be the а priori probability that 0; is the true value of 0, 
where Ё; > 0 апаў ё, — 1. Denote the vector (&;, £j ča) by Е. We shall define 
the risk corresponding to nature’s choice of an а priori distribution 5 and the statisti- 
cian’s choice of a test procedure ф as 


Ш 


RE, $) PE Г... f E wyg alefi, Ode 


Хане = Г... Г È фаб] (12) 


where the region of integration is the sample space. R(0; ф) is the risk from chcosing 
ф when 0; is the value of the parameter and p(a;|x) the risk in taking action i after 
‘observing x. In particular 


3 Ут 
р(а;|х) = = wah; е К6)). 05.3) 


Then it is known that there exists a pair (а, x”) with z' <x" such that the 
Bayes’ solution, one which minimizes R(E, $), is choosing a, when Ух < 2’, аҙ when 
я < Xr < z', and a, when Ye >a", For Sx = a' or Xx = 2", randomization may 
be required in the discrete cases, The numbers x’ and а” have been obtained such 
that рал |) = р(а, |а) for Ух = z' and р(а |х) = р(аз |=) for Xx = x". 


2. ТНЕ MINIMAX SOLUTION 
By a minimax test ¢*, we mean that 
sop R(E, 9*) < sup RÉ, $) for all 0. 
t 


For a test of the form of the Bayes’ solution just given, the functions R(4;, $) 
can be written as 


Ry, $) = шаР, (а' < Ex < 2") ws P, (Ez > 2") ей 
8(0,4)- wP, (Ex < =) ws, P, (Ха > =") 3.7 (2.3) 
(£s, $) = шаР, (Be < 2')-+wygP,, (2 < Bx < 2"). ‚. (2:8) 
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It will now be shown that if a pair (x’, 2") with a’ < 2” exists so that equality 
holds among the three risks, then the Bayes’ test above is minimax. (all such a test 
9%. Then we can observe that for this ф* 

RE, 0) = XERO,, ф*) = V. DE, = V, ‚© (2.4) 
where V is the common risk to the statistician regardless of which hypothesis is true. 

It remains to be shown that there is an а priori distribution £* such that 

RE*, $) > RE*, ф®) = V 275 (25) 
for we have found that V = sup RE, ф*). i 
i 


Let z = e?*, z' = e®* and z" — ер", Tt must be shown that for the choice 
of x’ and x" such that equality holds among equations (2.1), (2.2), and (2.3), there corres- 
ponds a vector (51, £j, Éa) of a priori probabilities satisfying р(а |=) = p(as |a) for z = 2 
and р(аҙ|т>) = р(аз |=) for z =z". But this is equivalent to showing that there exists 
а positive point in the &,’, &,’-plane corresponding to the choice of (а), æ"), where 
Ey’ = 1/6, and &, = Eje. 

Dividing each p(a;|z) by & 2", setting р(ау|®) = р(а„|®) for 2 = 2' and 
Р(аз|т) = р(аз |=) for z = 2", a solution for Ё, and £,' can be obtained, These, in 
determinantal form, are 


5(6;)(шла- wa) 0,0—8 
, —b(05)wss 10,500) =b)” р, 
а 5 г, 
wyjb(0,)z 9 — 9 b(05)(w,s—wss) 
, b(0, (uy, — 0; 0-99 —b(0.)wys Т; 
Ё D <i 
h 5 ws, (0, yz 02 — 05" ХИ { 
where AU й 
00100: иде — 6.) #8) 


Both D and D, are easily shown to be negative. In order for £,'to be positive, D; 
must be negative. This is so if 

Way, joe алы — (Wa — ttg; (t, — twp)" 
and this inequality is implied by the hypothesis that a pair (2, 2") exists such that 
F(0,, $) = В(б„ $) = BGs, 9). 

Now Ё > 0 and Ё, > 0. This defines the triple (Ё,, Ё, čs), and it is easily 
seen that it has the properties of a probability distribution. 

Suppose that this distribution is £*. Then by construction of £*, the Bayes' 
solution against it is of the form specified in Section 1. This is the minimax solution 
since the risk to the statistician if he uses 0% is at most V, and if nature uses the distri- 
bution &*, then the smallest risk to the statistician is V corresponding to the Bayes’ 
risk against this distribution. 
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3. AN EXAMPLE 


Asa simple example consider the normal distribution with variance known 
to be unity and the following set of hypotheses: 


Hi: и--і 
Н: p= 0 
Н: p= 1 


0 
[71 1 2 
[^ 1 0 1 
аз 2 1 0 


The risks (0; 9) when a single observation is taken then become 


RO.) = f fe diet. ] fis, d 
R(6,, $) = i fle, 0)йх-- 7 Гв, 0)йх 


R(0,, $) = 2 i f(x, Wde-+- f fee 1)4х. 


Equating R(0,, 9) to Р(0;, $) defines points on opposite sides of and equidistant 
from 0. Setting these equal to В(0,, 0) gives unique values to these numbers. Using 
tables of the normal distribution it is readily found that x’ = —.763 and 2” = .763. 
Thus the minimax solution is to take the observation and if it is less than —.763 
accept the mean as — 1, if greater than .763, choose д = 1 and д = 0 otherwise. 


A practical application of the method using a discrete distribution might be 
in the area of acceptance sampling where a purchaser has three alternative actions to 
choose from : (1) accept the lot, (2) accept the lot at a discount when it may be composed 
of "seconds", (3) reject the lot. Тһе lot may be good, poor, or bad depending on 


the value of the proportion defective, p, < p, < p, In such a case, randomization 
may be necessary. 
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USE ОҒ ‘ORDER-STATISTIC’ ІМ SAMPLING WITHOUT 
REPLACEMENT 


By Р. К. PATHAK 
Indian Statistical Institute 


SUMMARY. In sampling without replacement from a finite population, the order in which the 
units are selected, is immaterial for the purpose of estimation. This point was noted by Basu (1958) and 
Murthy (1957). Basu showed that the ‘order-statistic’ (sample units arranged in ascending order of their 
urit indices) forms a sufficient statistic, and, therefore, any estimator which is not а function of the order- 
statistic, can be uniformly improved by the use of Rao-Blackwell theorem. In this paper, certain results 
obtained by Murthy are shown to be immediate consequences of the above observation, 


It is shown that sampling with different probabilities with replacement, until we get a specified 
number of distinct units, is equivalent in some sense to sampling witk different probabilities without replace- 
ment, Some other related problems are also considered here. 


1. INTRODUCTION 


Let Y, Yo, ..., Ух be the Y-characteristic of the N population units under 
study. Let Р, be the probability associated with the j-th population unit 
(j=1,..., М). For simplicity, we shall always refer population units by capital 
letters and sample units by small letters, e.g., yr and p; denote the variate value and 
the probability of selection associated with the i-th sample unit respectively. In 
this paper, we shall throughout follow the notations used by Basu (1958). 


2, SAMPLING WITHOUT REPLACEMENT 
In sampling without replacement from the above population a particular 


sample may be recorded as 
8 = (i tos es Tn)» 


where a; = (Jp рыш) (= 1,?,..›%), and n is the sample size. 


The probability of drawing such a particular sample is given by 


а р : Фа ^s Pa EI 
Р) =. Gop)” (сас) О" = Pea) eu 


Tf we record the ‘order-statistic’ by 


T = (дуу -« £t»); 


409 


SANKHYA : THE INDIAN JOURNAL ОҒ STATISTICS : SERIES A 


where 2) = (Уну, Рау» Uw) is the i-th order-statistic (i = 1, ..., n), we have 


PD) = 21?ә-.. Pn 
МИ e (1—p,)(1—p,—p,) -- Иры) 


(2.2) 


where the summation is taken over all possible samples giving rise to the ‘order- 
statistic’ 7, 


Tt has been shown by Basu (1958) that Т is a sufficient statistic. Thus, if 
g(s) is some estimator depending only on s, by Rao-Blackwell theorem, a uniformly 
better estimator than g(s) is given by H(g(s)|7). For any convex loss function, the 
risk associated with E(g(s)| T) is smaller than the risk associated with g(s) 


3. SAMPLING WITH REPLACEMENT: NUMBER OF 
DISTINCT UNITS FIXED IN ADVANCE 
In this case, units are drawn with unequal probabilities and with replacement 


until we get a specified number ‘n’ of distinct units, If r denotes the number of draws 
in a particular case, the sample. в may be recorded ая 


8 = (n .,., 2), 


If we define the ‘order-statistic’ 7 by 


T = [2 (4), а)... Шу], 


where ү, is the i-th ‘order-statistic’ @ = 1, 2, ..., n), it is not difficult to show that 


ЕТ) = el 2 Фа (ра... HP Өр. ры t 


+(e Opay], (31) 


where X? denotes the summation over all possible combinations out of pq, ..., Фе» 


Фан» +++) Вы» and the term inside the Square brackets denotes the probability of 


getting Т іп r draws. Assuming without any loss of generality that pq)+...+pm)<1, 
we got on summing (3.1) over т Аж 


PP) = X Po [ DO Parte +P _ y Parte Paco 
=ч 1—9240)—...—2( 1—pa1—--—Po-) 


ух Po. ]. (5:9) 
ч 1-?а) 
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It сап be proved by induction over n that (3.2) and (2,2) are equal. 


Thus, if we rely only on the ‘order-statistic’ Т, the two methods of sampling 
are еу the ваше, 


4. Improvine Des RAJ's ESTIMATORS 
Des Raj (1956), іп sampling without replacement, gave the following set of 
uncorrelated estimators of У — 5 Y 
іі 


t(s) = ұла 


tals) = yy + ЕН (1—р,); 
Po 
(8) = ys + Ya dr НУ А (L — p — py —  — Pin); 


в) = Vy + Ya F e ya F in = Pi — Pa — + — ру). 


Theorem 1: For any convex loss function a uniformly better estimator than 
т n H 
Цв) = X ct(s) (X c; — 1) 
ізі i=] 


P(T) 
PT)’ 


where P(T'/(i)) is the conditional probability of jetting the ‘order-statistic’ Т given that 
i-th order unit was drawn first. 


Proof : Since 


(4.2) 


is given by Ев)! T] = ya) 


PUR, = у, аңы = ta) | 21 $$... 2; 4] 
P[o; = Kay, 254 = 26) | p Bos 222,11 


= еріс” pico Pea Duy ane ... (48) 
5 р 1-21-2--“-21-2ф 
it follows that 


Eltzax(8)—b(s)| Т] = z| aa 1-р-Ә---р)- | Перу руз р)! |=. 
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Therefore, 


m 4” кы Р(7|(4)) 
Е [> e; (5)| T] = Ets) T] = E "e трт) ` 


Corollary 1: When т = 2, we have 


1 


VAS 
E i gaie ы aa 
ДС = тетя 


У. У 1 
[0-29 % +а-һ) % |. 


Pa 


Corollary 2: Іп simple random sampling without replacement 


tle) = +... Hiat Nit) у, t=1,...2; 
and = Ее) |Т] = BAOT — 7 X и. 


Theorem 2: А uniformly better estimator than 


ge) = È вуду) 


n 
n2 w- 1) of ҮЗ, is given by 


E vo PIG) + $, nomo PTO, C) 


ФТ) = Ват] = — Pm NEUEN iuo 


4 


Proof : Using (4.3), it сап be seen that 


Elt(5).5) —5(5) (8)|Т| = 0, (J= 2, m) 


and ЕЙ (в) (в) —(5)t())T] — 0, (4-1,2,...)-2) 


and hence Blg(s)|7) = Еш) t5)]T] 


Z Pw P|) + X ушун ©, G) 
ісі = 


P(T) 


l * Remark: The estimators (4.2) and (4.4) can also be obtained by improving the usual 
estimators of Y and Y? under the sampling scheme discussed in Section 3. 


412 


USE OF ORDER-STATISTIC IN SAMPLING WITHOUT REPLACEMENT 
Corollary 3: When n = 2, we have 


и (о) б) - — 3 fy Ya (т р) fe 
imu (2—2a) —2) \ Ра) Фа) СТ) Ро) 


+ 201—0) (1-е) Zo Yo ]. 
Фа) Pr) 


Corollary 4: In simple random sampling (without replacement) 


NN-1 2 


Be Ml SB Sis ty = 


Уа yg 


The estimator (4.4) is used to derive unbiased variance estimator of 


Ў, P(T |(0) 
m У P 


5. IMPROVING Das’ ESTIMATORS 


The set of estimators of Y given by Das (1951) is as follows : 


zt TUER 
(8) » 


1 


шө) = Уа (1-?!) y- 


що и, Dame Pra) Uan) А. 
Pr Pr-1 D» Pi 
1 . 
“(У--1)У--2).. (М-ға)! 
u,(s)= №, (1—21—23 ++» —Pna) .... (1-22), (1=p,) 
x Pn Pn-1 P» Py 
1 


(5.1) 


' (W=1)(W—2) ... (07—041) 7” 
A uniformly better estimator than w (s) is, therefore, given by 


1(Т) = ЕІһ8) |71 
т А 1 
= M Уч) Я и). ИО 
e ЕТ) 


P(T |21, Жз, 29 r-r Vp = тш) 


p, qw 2.2562) 
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where the summation X' is taken over all possible 2, ..., 2, 1. 

' It is easy to see that the estimators (T) (r = 1, ..., т) are identical if and only 
if the sample is drawn by simple random sampling (without replacement). In this 
cage (5.2) is same as (4.2). 

This shows that in simple random sampling (without replacement) the estimator 
based on the sample mean is more efficient than Das’ as well as Des Raj’s estimators. 
An unbiased estimate of Y? based on u,(s) and y,(k < r), is given by 
тв) = “(8)у, + (N—1) и (з)уь (k < r = 1,2, ..., n). ... (5.3) 
A uniformly better estimator than this is given by 


$ 1 1 
Еров) |2] = У 2 [z OD cep ee 2) | 
TE < Ww P(T) "Бу 
! 1 1 
» [z агер К чайы ME {P(E |а... ак = та» 
(N—1) © (N=r+1) р, 
+ (®—1) ЭО 7H 
179 =1 ; 
2,4, r = ау) + P(T |а... My = vy ss Bay T, |] (5.4) 
АЕ AD ; OT A 


This expression will also be identical for all r and k if and only if the sample is 
drawn by simple random sampling (without replacement). 

Further, it may be seen on similar lines that in a more general (without replace- 
ment) sampling scheme which has been considered by Des Raj (1956), the estimators 
of Y (or of Y?) obtained by improving Das’ estimators will be identical if and only if 
the first unit in the sample is selected with pre-assigned probabilities and the 
remaining units are selected by simple random sampling (without replacement). 
For further reference about this, one may refer to Des Raj (1956) and Murthy (1957). 
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ОМ THE EVALUATION OF MOMENTS OF DISTINCT 
UNITS IN A SAMPLE 
By Р. К. PATHAK 
Indian Statistical Institute 
SUMMARY. In this paper, exact expressions for the moments of distinct units that appear in 
э. sample, are derived under any sampling scheme. Тһе importance of such a problem arises, e.g., when 
we select a simple random sample (with replacement) from a finite population апа require the variance 


of the average of distinct units selected. It has been shown by Basu (1958) that this average is ә better 
estimator of the population mean than the usual overall average, 


1. PRELIMINARIES 


In this section we give a lemma which will be used in the next section, 


Lemma: The coefficient Om (n) of 


а) аз ат. 2 
Z, Ж .. Zm (where т < №, ов 0 and У a,=n) 
iel 


in the expansion of (Z,4+Z_ +... +7), 
is given Бу* _ On(n) = m"— (7) (m—1)y4- ... (И (maa) ма) 


In terms of the ‘differences of zeros’, С,, (n) can be represented as 
Q,(n) = APO" = Ат" |z = 0, и. 2) 


where А is the difference operator with unit increments. We shall be using freely 
these two expressions (1.1) and (1.2) for О„(п), whichever will be convenient to us in 
subsequent sections. 
Б вз «т ж, ‚ 
Corollary 11: Coefficient of Zi; Zis ... Zim ( where ints, ..., im are any m 
у т 
different integers chosen out of 1, 2, ..., №; ов > 0 and | а; = n) in the expansion 
ж 
of (Z,+2Z.+ ... +Zy)”, is given by €, (n). 
Corollary 1,24 
О„(т) = m [C,(n—1) + Cs 5(n— V]. els (1.8) 


*Note that Om(n) = 0 for m 2 n. 
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Corollary 1.3: For all positive integral values of » and М, we have 


N= X Оди) ® e (14) 


2. MowENTS оғ DISTINCT UNITS 
Positive order. Consider a population containing N units and a sampling 
scheme S for which ) | 
pi = probability of inclusion of the i-th unit in the sample; (i = 1, ..., №) 
4 = probability of exclusion of the i-th unit from the sample; 
Pi; = probability of inclusion of the i-th and j-th units in the sample; 
di; = probability of exclusion of the i-th and j-th units from the sample, etc. 


We shall denote by v, the number of distinct units that appear in a sample. 
Tt is obvious that А 
у= Ze ZO uus -Zy, КОК?) 


oan PAN l if the i-th unit is included in the sample; 
0 otherwise. 


Now, by definition, if n is any positive integer, the n-th order moment of у is 
given by 
E(v") = Е(2,--2,--...--2,)” 
N a а; 
-E[EXnnA. UNE 2. (2.2) 
mel 


where 2, denotes the summation over (У) combinations of mZ's chosen out of 


Zi Z» ..., Zy and У, denotes the summation over all products of the type 
€i a Om m 
2, 2% .. 2, (щз>0; 2 оң — n). 
=1 


@: оз ат 
Obviously, (5,7 4 ..2,) = р,,...„ (п), 


and therefore, E(v") = X C, (n)9,p. ... (2.3) 
pa КОЗІ 
апа г. 
Ey") = cr (ш) Сир. 
when Pinin... in = Раз... т for every sot of m distinct units Nite fae 
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Negative order. То derive the negative moments of у of any order under any 
sampling scheme, we assume that у > 1%, i.e., 


в (+) =a, l E ew 


У-ш-ш-..-чу) i MN 


N 
Since 0 <2ш< (N —1), the infinite expansion is possible. Now let ! 


(1а) 165 A, at, 
Те 


j=] 14 È ашы.) | 2. (2.5) 


so that z( Wi 


Since the infinite series 1 + 5 E (u+ u+... +uy)" is bounded above by the absolu- 


tely convergent series 


it, therefore, follows that 


1 = 1 ў T7 r 96 
ЕЗ б ) = | panel № Elut... +u) |: ... (2.6) 
But it is apparent from (2.3) that 


ат 


N a 
(шш. Fiy) = E [E ИН ] 


(#-1) 
= X (Bite Аа =0. 
Mal 


А 
*It is evident that this assumption is indeed necessary, otherwise no negative moment of v exists, 
In this paper, we restrict ourselves to those sampling schemes for which P[v2 1] = 1, 
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Putting this in (2.6), 


The N-th term vanishes by assumption qy... y = 0. 


we obtain 
(z) =m [+84 а a "Bia. A | 
=al 1+ ay (21012 > мА” 2 AN Ar rs ] 
2 ІШ |:-0 | 


Өш ml 1+ = (з... „)А" ( N 


which on expansion gives 
a VIAE SU b (1) т (м) 2.7 
ү) с үс "2a UE cat 7 (N—m+1) Hey Ж] no 
Inoase,q; ;. in = drea for every set (i, d ...,%,) of m distinct units, this 
reduces to 
am EDANA АШ m (м) 
=} 2 Qn. Шул асып” |. 
Corollary 2.1: Putting t = 1 in the above result, we get 
1 1 (ғ) т (т) 
Е(——|—|—. 1 „(= y 
(=) [yt (Бата. » Ug J mame] 
Sines Boum aes ш т! | 
JU UNE ay. (N—m+1 pues у Аи М N(N-1)...(N—m) 
k 1 1 1 1.2 11.2..(М- сы 
. E = ... B 
С) xl tac ant yay Set toy —1..24 Ds sa] 
(2.8) 
‚ tm) of т distinct units, 


In case, d; tas ve, fm lit т for every set (й, fp; ... 


(2.8) reduces to 
(2.8a) 


ГЕ V Aet ыд у 


Particular cases: (а) Simple random sampling (with replacement). 
In this sampling scheme 
— (N=m)" 
912... т = N" 


E 


where n is the size of the sample. 
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f (¥-1) n-i n-l n-1 Я 
о аа. Nm” 1 42 +..+у 
су 2 Ут] v м 


where В, is the s-th Bernoulli number. 


y For large N, this gives us a very convenient method for computing H | i | 


(b) Simple random sampling (without replacement). 


Here, 
ж-т) 
О ШЕЛ for m < N—n; 
0 otherwise 
Ne сеу lead саз R) 
а) а еа о ear! 


= dc [нео енд 


(3) 1.2.3 


((ө--1)(п-Ғ2)..(М--2)/--1) 
EY. 123... (П). 1 


Combining the terms one by one, we get 


= 


1| 1 (n+1(n+2)..N _ 1 
5/2” 1.2.3...(N—n) m 


which is in agreement with the process of sampling. 


Corollary 2.2: For any integer # (52 0), it can be shown in a similar manner 
that 


Qa x А 
(у) = N'+ 5 (аба mA" —2y |. 


SNH E (тиме) (V-m (PU ии 


SONI. ... (010) 
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The following theorem giving the expected values of certain functions of v, 
is an obvious generalization of (2.10). 

Theorem: If f(Z) is any function of Z defined in the domain, 0 < Z < N, 
for which the infinite expansion in powers of Z is possible, and if the expectation can be 
taken term by term, then 

Із (w=) 
ЕДУ =f) + es (Etiz -mf (N —m)— (P) 07—41) +...(—)"ЈО)]. 2. (2.1) 

Proof: Express f(Z) in the form i 


Д2) = Ў AZ. 


By assumption E[f()] = E | X 4] = Ў AG). 
Т==— 0 
Putting the value of H(v") from (2.10), we get 


2 (W=1) 
Л = È { ALN + ® Qa JT ay Ll } 


(w=) 
=f(N)+ c (“Фа +m) A" f(N —2) | „е, 
which on expansion gives (2.11). 
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MOMENTS OF FIRST ORDER AUTOCORRELATION OF THE 
SECOND ORDER AUTOREGRESSIVE SCHEME 


By SUKUMAR MUKHERJEE 


у Patna University, India 
SUMMARY... The paper considers à realization {at} of a time series obeying Yule Scheme; 
Шра--Ол Яна A Te = 862 
d variance 02, independent of t, а an 
for tke first and second raw тотеп! 


d а: being, of course, constants. 


where the e’s have zero means ал. 
ts of the first ‘order 


Under the hypothesis that a270, expressions 


autocorrelation (circularity of autocovariances is assumed) has been given. ‚ 


1. INTRODUCTION 
., п) be a realization of a time-series, with theoretical 


Let іш) (i = 1,2,- 
derlying model 


mean of the series zero and the un 
2,,9—04 Cr a Me = бөз 

where ев follow normal distribution with mean zero and variance 02, а, and оз are” 

constants. Giving circular definition io autocovariances the joint distribution of 


ті and 7s, uncorrected for mean, can be written in the form (Quenouille, 1949), 


1 1 
Чи, № ТЕ i 
| 2 3-05 еа ВВ) Ол) 


q(ryralo o2) = ури 
Т 


where 
rtr 
А = loto 
Bg = —а1(1—95)717— 9872: 
that the form (1.1) gives correct margin: 
the distribution given in (1.3), Writing 


al moments for 7; 


Jenkins (1954) has shown 

but not for ra and suggested (1.1) in the form 
n 

pi та/910а) = pin r X1 «cod --a$— 204 (1 — оу а} i es (1.2) 

= 0, the’ transformations 7 = 71 and 


where p(r, 7) is the distribution when o4 = 9 


4) м 
и give | 
pirovard) = plr) ро) 14-24-08 —201(1—98)71— wa (1—18) 220571 (1.3) 
where > pn) = pirla = 0 = оз), 2(0)- peja = 0 = а) 
and v is independently distributed of r, with distribution 
| .. (1A) 


рф) = КК" (1—9) 
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Moments of (1.4) are given by 


(2k—1)(2k—3)...1 о. 
m( 2k) — (n-I-1)(n:-3)...(n-I-2& —1) ° m'(2k 1) m'(2k). 
In particular 
= = БЕН rn 0 
Ev) = ao NC т аз азын м-н 


Since the mean and variance are generally needed for usual test of significance only 
the first and second order moments of r about the origin are quoted below. Some 
of the relations given below are obtained after very lengthy algebraic reductions; 
here the details of the derivations are omitted. 


2. SOME NOTATIONS AND RESULTS TO BE USED SUBSEQUENTLY 
Let A = 14-02-02 —201(1—о0,)ғ—– 2а and В = 2a.(1—r?)v, where r stands 
for т, 


dA 2 с 
BD, IR: bs = 2a,(1—a) = 46 
ФА A Ax! 
and tice = —4a, = А” =— (2 
' d E 4x, = Ау ie., аз ( 2 ) 
also, А,=1+әфна 2. 
If i AT = Ser ert. 
we have 
= (1-403 +08)! = д 
kı = —{А5'—1 А, 


21k, = КЕНТ) Аз? (Aj А-а; 
ЗІ, = ЦЕНЕ) 4573045) ЗК 1) 4547 Ag t7? 
alka = КИЗ АНА: GHEH IUH) 457 (A945 34-1) 4549): ete. 


т 
др = 72 We have p, (г) =the density function of r based on n observations 


calculated from a sample of independently distributed normal variables, so 
(3+1) = 
Г! pe T aw 01° ? (Dixon, 1944), 
therefore » 
28 +1 
^ (1=92)py(r) = DG күс" =°) 3 TEGS ) 1 B 
m 
Г, л йз = 


214-1 
= зрэ Риз» (r). 


Dar) = 
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3. FIRST MOMENT OF 7 


т 


Pre) = p(r)p(o(A—Bv) 2- p(r) poA Во 


= мл {ia oet E". (Bus. yg) 


1 
50 p(r) = Јана А") BIZ (4s ) Papy (А-а 


20021-3) JA% 2 ,-t2 
Par ( A hus xs ва) 
Now A ер dr = Sk 1 ky. 1. 8. 
| s (rir { + E apa NAFA * Ў } 
1 M 1 
© Е ae 4 
and Jra * Py (r)dr = ae) А Pann (r) — жз ( 74773 Руини (rdr 


2 1 
_ —їД, Ку. ttl Gy TE 
= (ТА Pun dr IU щ Jah nan (e s] 


using this result we have from (3.1) after some algebraic reduction 


Т р , ” ” 
icto es Бетте aan a 


8(t--1) (ау q ë+) (4) =. } 


© (0-8) (%--10) (s 


Now in order that the generated series be of finite values for large n, |%| should be less 
than 1. Hence for finite n the series on the right hand side of (3.2) will converge. 


4. SECOND MOMENT OF 7 


From (3.1) we have 


20214-3) (А) _ 
(21:-2 (2121-4) (4-7 Puun O) 


(4.1) 


1-р) = ST A7 Рана + 
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Since 2 д 142—212. we have from (4.1) 


| я 
as, AP) =le ao FE Tue 0+ 3 (ыма 
түз 
T xu (45) Роз (+...) 049) 


where Piers (7) = the density function of r based оп 2(£4-s) observations. That is 
the density function given in (3.1) where ¢ is replaced by t--s. After some 
algebraic reduction we have 


E fom p(r)dr = —2t, { а aw (4 SEE (ay 4 


ara EE 2-6 arli 4 ys) 


214-3 
2+6 


tA отно @+ se (AP) mings DERES (45 miesst. } 


(4.3) 


where зз) (1) = the first moment of r based on 2(@4-в) observations. That is the 
expression obtained from (3.2) when ¢ is replaced by 14-5 whence 


1 


jen mio a t este (e EI] 


+e gal E. ea 


а + ваз (eye ] 


es (2) D- Ee Se een (ду...) 


бора ама ко. ма 


ON AUTOREGRESSIVE SCHEME 


Where c is an arbitrary constant. From initial conditions we find c = 1/2f+2 as 
- first approximation. Thus 


ЧАНАТ! 2t4-1 Di AS 21-43 [Aq A? aXe: 
т) = i. 24 0 на 0 4& (4 
21-2 ne {ж са + AF4 ( a ! 0-6 (%) 9418 ( T) 
%--5 үА0\? Г 2-+-1 4. (А 2(4--1) /А% 
du seth ee p 1 2 MAA ae : 
sce] | (1— aa) | 4447 DLE a 24-8 (8) 
ИИ е 1 
Ше oq | oy 
Particular cases. (i) Tf % = 0 and o, 5 0 we have from (3.2) and (4.5) 
d Зої ло 
m'(1) BR» 74% 2. (4.6) 
and АТЫН ee Mi a ан тала Сауд 
mQ — 5 ott (55 ә ) 
DA. АНИ a 
E На а) 
Le fie PED os 
a { Mou tee }. 2. (47) 
(ii) If a, = 0 = а, we have from (3.2) and (4.5) 
m'(1) = 0 в) 
апа m/(2) = 1|ө--2 ... (49) 


Results (4.6) and (4.7) agree with the corresponding moments given by Leipnik (1947) 
and (1958), Jenkins (1956), Author (1958). Results (4.8) and (4.9) agree with, the 
corresponding moments given by Dixon (1944). 

T am greatly indebted to Dr. D. N. Lal of the Department of Statistics, Patna 
University for his advice and encouragement during the preparation of this paper. 
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CORRIGENDA 


A Note On Mutually Orthogonal Latin Squares: By 5. S. Shrikhande, Sank уй 
23, Series А, 115-116, 


(1) Summary, line 1: should read ‘If nzZ4, it is proved’ instead of ‘It is proved’. 


(2) Page 116, line 13: should read ‘Adding to the blocks in (2.1) further 
blocks’ instead of ‘Forming 2n block’. 
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